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1  Executive  Summary 

As  the  computing  industry  matures,  it  carries  with  it  the  immense  burden  of  maintaining  the 
flexibility  and  performance  of  decades’  worth  of  legacy  code.  Legacy  programs  often  have  little 
in  common  with  today's  development  practices;  they  were  written  in  different  language  dialects 
and  targeted  a  different  class  of  computer  architectures.  As  recent  trends  demand  that  modern 
software  be  written  with  explicit  parallelism  to  harness  the  power  of  multicore  architectures,  it  is 
becoming  largely  intractable  to  manually  upgrade  a  legacy  application  to  modem  performance 
standards. 

We  study  the  technology  innovations  required  to  radically  improve  the  process  of  understanding 
and  parallelizing  performance-critical  legacy  application  code.  We  demonstrate  the  usefulness 
and  feasibility  of  such  a  system,  dubbed  “Program  Reincarnation”,  using  a  simple  prototype.  A 
Program  Reincarnation  tool  will  assist  the  programmer  in  replacing  the  program's  code  (“the 
body”)  while  preserving  the  original  specification  (“the  soul”).  Our  technique  originally  focused 
on  streaming  applications  such  as  multimedia,  graphics,  and  signal  processing;  we  employ  a 
combination  of  static  and  dynamic  program  analysis  to  extract  the  simple,  high-level  block 
diagram  from  the  optimized  and  obfuscated  legacy  code.  Our  comprehensive  approach  is  broadly 
applicable  to  program  understanding,  documentation,  refactoring,  and  automatic  parallelization. 
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2  Introduction 


Over  the  last  few  decades,  programmers  have  written  a  staggering  amount  of  code.  These 
billions  of  lines  of  code  have  profoundly  impacted  the  human  race.  Today,  programs  are 
everywhere  -  in  computers,  cell  phone,  cars,  cameras  and  cash  registers.  While  computer 
hardware  and  technology  is  going  through  an  incredibly  rapid  growth  period,  in  many  areas 
computer  code  seems  to  enjoy  a  longevity  mainly  associated  with  a  mature  field.  It  is  not 
uncommon  to  actively  use  computer  programs  written  two  decades  ago.  Furthermore,  most  new 
software  includes  a  large  amount  of  program  code  written  long  ago.  For  example,  the  Microsoft 
Windows  vulnerability  MS-03-011  affected  Windows  95  to  Windows  2003,  suggesting  that  the 
code  written  before  1995  was  still  in  use  in  a  program  produced  8  years  later.  In  contrast,  it  is 
hard  to  even  find  a  working  computer  older  than  seven  years,  let  alone  a  modern  computer  built 
with  parts  designed  two  years  ago. 

This  reliance  on  old  code,  written  using  old  languages,  outdated  methods,  and  targeting  now- 
defunct  machines,  is  creating  a  massive  obstacle  to  the  rapid  growth  of  computers.  Most  of  these 
legacy  programs  cannot  take  full  advantage  of  the  exponential  performance  growth  rate  of 
modem  processors.  Some  of  the  performance-critical  legacy  programs,  ones  that  were  highly 
optimized  for  the  architecture  of  the  day,  will  experience  slowdowns  and  compatibility  issues  on 
newer  processors.  Legacy  code  has  had  an  even  larger  negative  impact  in  computer  architecture. 
As  legacy  applications  are  extremely  important,  commercial  microprocessors  have  been 
“wasting”  the  transistor  budgets  offered  by  Moore’s  Law  to  improve  the  performance  of  legacy 
programs  at  any  cost,  even  if  the  performance  gains  are  only  marginal.  Monolithic  superscalar 
processors  are  a  good  example  of  this  trend. 

Recently,  processor  vendors  have  started  to  break  this  trend  by  moving  to  multicore 
architectures.  Multicore  architectures  provide  much  better  peak  performance  per  die  area,  but 
require  programs  to  be  explicitly  parallel.  This  trend,  while  providing  much  higher  performance 
to  modem  programs,  puts  legacy  applications  at  a  disadvantage  as  they  will  no  longer  get 
performance  improvements  from  new  generations  of  processors. 

In  order  to  take  advantage  of  modem  multicore  processors,  one  must  rewrite  these  legacy 
binaries  using  modem  languages  and  techniques.  Apart  from  performance,  there  are  many  other 
benefits  of  updating  a  legacy  code  base.  Most  of  these  programs  are  written  in  older  languages 
without  the  benefit  of  many  powerful  language  features.  Furthermore,  during  the  last  decade  we 
have  made  huge  strides  in  software  engineering.  Rewriting  a  legacy  code  base  using  these 
techniques  can  make  them  more  efficient,  malleable,  portable,  secure,  and  fault  tolerant. 

However,  rewriting  a  legacy  program  is  a  daunting  task.  While  the  application  may  be  in  wide 
use,  it  is  very  difficult  and  time  consuming  to  reverse  engineer  the  exact  algorithm  implemented 
and  any  special  cases  handled  by  the  application.  The  original  programmers  are  no  longer 
available  in  many  cases.  Information  that  helped  the  original  programmer  such  as  specification 
and  requirement  documents,  simulations,  mathematical  proofs,  tests  etc.  may  have  been  lost.  In 
some  cases,  even  the  source  code  may  not  be  available.  Even  if  the  source  is  available,  the  tool 
chain  and  the  libraries  have  often  diverged  over  the  years,  making  it  impossible  to  build  the 
application  from  source. 

Today,  recreating  the  specification  of  a  legacy  application  is  more  error  prone  and  takes  longer 
than  the  coding  task  itself.  We  demonstrate  how  to  drastically  reduce  the  cost  of  recreating  a 
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program.  A  Program  Reincarnation  system  provides  a  tool  chain  to  help  the  programmer  replace 
the  program  code  (‘the  body’)  while  adhering  to  the  original  program’s  specification  (‘the  soul’). 
This  tool  chain  takes  advantage  of  the  availability  of  the  source  code  and  a  working  program  to 
help  guide  the  programmer  through  the  reincarnation  process.  The  DoD  modernization  effort  can 
hugely  benefit  from  this  capability  by  reducing  the  software  porting  cost  associated  with  most 
hardware  upgrades. 

Streaming  applications  amplify  the  difficulties  of  the  current  recreation  process  while  also 
providing  tantalizing  possibilities  for  drastic  reduction  of  programmer  effort.  Most  streaming 
applications  are  performance  critical.  Thus,  programmers  were  forced  to  hand  optimize  them  for 
the  architecture  of  the  day,  making  it  virtually  impossible  to  understand  the  underlying 
algorithm.  Furthermore,  streaming  algorithms  do  not  naturally  fit  in  to  old  programming  models, 
requiring  complicated  scheduling  and  buffer  management  that  further  obfuscate  the  original 
algorithm.  Most  of  the  streaming  algorithms  were  originally  developed  in  prototyping 
environments  such  as  Matlab.  Unavailability  of  these  intermediate  representations  that  helped  the 
original  programmer  further  complicates  the  extraction  of  the  underlying  algorithm.  However, 
most  of  the  streaming  algorithms  correspond  to  simple  block  diagrams  with  minimal  control 
flow.  There  are  very  few  special  cases  in  streaming  algorithms.  Thus,  once  the  underlying 
algorithm  is  discovered,  it  leads  to  a  simple  and  relatively  straightforward  representation. 

The  overall  design  flow  in  a  comprehensive  system  for  program  reincarnation  is  given  in  Figure 
1 .  The  prototype  implementation  studied  here  includes  the  minimal  subset  necessary  to 
demonstrate  end-to-end  flow.  The  design  flow  is  controlled  by  the  Assisted  Application 
Reincarnation  Tool.  First,  the  execution  profile  of  the  instrumented  binary  as  well  as  static 
analysis  of  the  source  code  is  fed  to  an  inference  engine.  This  engine  builds  an  application 
knowledge  base  with  the  annotated  streaming  representation  of  the  program.  This  knowledge 
base  is  used  for  many  tasks.  The  streaming  block  diagram  and  the  derived  specifications  are 
presented  to  the  programmer.  In  addition,  the  programmer  is  provided  with  hints  on  refactoring 
and  domain  specific  transformations  of  the  program.  When  possible,  the  system  will  attempt  to 
automatically  parallelize  the  application.  Appropriate  tests  are  generated  to  help  discover  the 
program  invariants  as  well  as  check  the  compliance  of  the  reincarnated  application. 
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Figure  1.  Design  Flow  for  Program  Reincarnation 
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3  Methods,  Assumptions,  and  Procedures 

In  this  section,  we  identify  and  investigate  the  technologies  needed  for  a  program  reincarnation 
tool. 

3.1  Study  of  Technology  Needs  and  Innovations 

We  investigated  methods  that  use  sophisticated  static  analysis  of  source  code,  however,  we 
found  that  information  from  static  analysis  is  not  sufficient.  This  is  mainly  due  to  the  complexity 
of  C  program  and  obfuscation  by  hand  optimizations  performed  by  the  users.  We  could  not  find 
any  existing  compiler  that  was  able  to  extract  useful  high  level  information  from  the  complex  C 
programs  in  our  benchmark  set. 

We  have  investigated  the  structure  and  capabilities  of  using  an  application  knowledge  base.  We 
identified  the  application  information  required  for  program  reincarnation  and  identified  an 
application  knowledge  base  format. 


We  also  studied  the  technology  innovations  needed  for  test  generation  to  support  program 
reincarnation.  For  example,  dynamically  detected  invariants  reveal  the  properties  of  the 
program's  execution  over  the  test  suite.  Such  information  can  be  used  in  feedback-directed 
random  testing  generation.  In  addition,  the  inferred  invariants  can  also  be  used  as  a  type  of 
coverage  metric  such  as  for  test  selection  and  prioritization;  more  coverage  yields  a  better  test 
suite.  These  test  generation  and  evaluation  techniques  enhance  the  soundness  of  dynamic 
analysis  and  help  programmers  have  better  understanding  of  the  legacy  code  for  program 
reincarnation. 

We  also  studied  program  refactoring  opportunities. 

3.2  Assisted  Application  Reincarnation  Tool  (AART) 

AART  is  the  nerve  center  of  program  reincarnation.  We  demonstrated  the  feasibility  of  AART 
by  showing  that  our  simple  annotations  can  easily  be  added  by  the  programmer.  We  created  a 
process  for  extracting  coarse  grained  stream  data  flow  from  existing  C  programs  to  parallelize 
these  programs  by  taking  advantage  of  streaming  parallelism.  We  also  developed  a  “global  view” 
of  the  program  behavior  that  can  be  extracted  from  existing  programs. 

3.3  Binary  Interpretation  and  Instrumentation 

The  binary  interpretation  and  instrumentation  is  critical  for  gathering  the  invariant  information. 
We  study  the  extensions  required  for  current  binary  instrumentation  tools  such  as  DynamoRIO, 
PIN  and  Valgrind.  We  developed  the  binary  interpretation  and  instrumentation  tool  using  the 
Valgrind  system.  This  analysis  tool  can  gather  the  necessary  profile  information  and  extract  the 
data  dependence  patterns  from  legacy  applications.  Our  tool  interprets  every  program  instruction 
and  recognizes  which  partition  it  belongs  to.  We  maintain  a  table  that,  for  each  memory  location, 
holds  the  identity  of  the  program  partition  that  last  wrote  to  that  location.  On  encountering  a 
store  instruction,  the  partition  writing  to  the  location  is  recorded.  Likewise,  on  every  load 
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instruction,  a  table  lookup  is  performed  to  determine  the  partition  that  produced  the  value  being 
consumed.  Every  unique  producer-consumer  relationship  is  recorded  in  a  list  and  outputted  at  the 
end  of  the  program,  along  with  the  stream  graph  and  communication  macros. 

3.4  Inference  Engine 

We  believe  that  many  shortcomings  of  static  program  analysis  and  automatic  parallelization  can 
be  mitigated  by  observing  the  computations  performed  by  the  application  at  runtime,  and 
performing  machine  learning  to  generalize  its  observations.  The  outputs  of  the  generalizations 
can  be  the  basis  of  a  program  specification  given  to  AART.  We  have  built  a  simple  inference 
engine  to  determine  the  pipeline  and  data  parallel  sections.  It  is  possible  to  build  a  more  powerful 
inference  engine  using  the  Daikon  system,  which  is  the  state-of-art  artificial  intelligence  based 
system  that  can  dynamically  detect  likely  high-level  program  invariants.  Those  program 
invariants  are  useful  in  program  understanding  and  used  to  infer  communication  patterns,  which 
are  critical  to  (semi)automatic  parallelization  and  program  reincarnation. 

3.5  Streaming  Representation 

In  the  prototype  system  we  use  a  simple  streaming  intermediate  representation,  loosely  based  on 
the  MIT  Streamlt  compiler.  While  our  primary  focus  was  on  streaming  applications,  we  also 
studied  the  source  code  of  five  open-source  Java  projects.  We  analyzed  qualitatively  and 
quantitatively  the  change  patterns  that  developers  have  used  in  order  to  retrofit  concurrency.  We 
found  out  that  retrofitting  concurrency  is  not  a  one  time  event,  but  it  is  a  continuous  process.  The 
first  motivation  for  retrofitting  concurrency  is  often  to  increase  the  responsiveness,  and  then  later 
the  throughput  of  an  application.  As  the  application  matures  and  makes  more  use  of  concurrency, 
the  predominant  changes  fall  into  fixing  concurrency  errors,  fine-tuning,  and  improving  the 
scalability.  Given  the  importance  and  the  length  of  such  transformations,  tool  developers  should 
consider  (semi)automation  for  each  stage  in  the  concurrency  lifecycle  in  order  to  improve 
programmer  productivity. 

Many  application  domains  have  a  rich  set  of  domain  specific  idioms,  program  representation 
standards  and  program  transformation  opportunities.  For  the  domain  of  streaming  applications, 
the  steady-state  communication  pattern  is  regular  and  stable,  even  if  the  program  is  written  in  a 
language  such  as  C  that  resists  static  analysis.  We  employ  a  dynamic  analysis  to  trace  the 
communication  pattern  between  program  partitions,  which  is  used  to  construct  a  stream  graph  for 
the  application  as  well  as  detailed  list  of  producer-consumer  instruction  pairs,  both  of  which  aid 
program  understanding  and  help  track  down  any  problematic  dependences. 


3.6  Block  Diagram  and  Specification 

We  have  built  a  tool  that  can  display  the  parallel  regions  and  the  communication  pattern  in  a 
programmer  friendly  block  diagram  representation.  Examples  of  stream  graphs  for  MPEG-2  and 
MP3  appear  in  Figure  2.  The  stream  graph  presents  a  coherent  high-level  block  diagram  of  the 
application  to  the  programmer. 
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(a)  MPEG-2  (b)  MP3 

Figure  2.  Visualizations  Generated  by  the  AART  That  Shows  the 
Communication  Pattern  of  MPEG-2  and  MP3 


3.7  Automatic  Parallelization 

Automatic  parallelization  is  a  critical  component  in  this  process.  If  automatic  parallelization  is 
successful,  it  will  drastically  reduce  the  work  required  by  the  programmer.  However,  decades  of 
intense  research  have  not  achieved  fully  automatic  parallelization.  The  tool  we  built  performs  a 
partial  automatic  parallelization.  Using  the  streaming  representation,  the  program  is  decomposed 
into  distinct  execution  threads  and  mapped  to  a  multicore  architecture.  The  prototype  employs 
lightweight  programmer  annotations,  directed  by  the  tool,  to  achieve  a  semi-automatic  mapping. 

3.8  Application  Study 

We  have  shown  that  our  tool  can  extract  parallelism  out  of  six  real  life  legacy  programs.  Three 
of  these  are  traditional  stream  programs  (MPEG-2  decoding,  MP3  decoding,  GMTI  radar 
processing),  and  three  are  SPEC  benchmarks  (parser,  bzip2,  hmmer).  The  characteristics  of  the 
generated  parallel  stream  graphs  and  performance  results  on  a  four-core  machine  are  shown  in 
Table  1.  Our  analysis  extracts  a  useful  block  diagram  for  each  application,  and  the  parallelized 
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versions  offer  a  2.78x  mean  speedup  on  a  4-core  machine.  Speedups  range  from  2.03x  (MPEG- 
2)  to  3.8 9x  (hmmer). 

Table  1.  Characteristics  of  the  Parallel  Stream  Graphs  and 
Performance  Results  on  a  Four-core  Machine 


Benchmark 

Pipeline  Depth 

Data-Parallel 

Widths 

Speedup 

GMTI 

9 

--- 

3.03x 

MPEG-2 

7 

--- 

2.03x 

MP3 

6 

2,2 

2.48x 

197.parser 

3 

4 

2.95x 

256.bzip2 

3,2 

7 

2.66x 

456.hmmer 

2 

4 

3.89x 

GeoMean 

2.78x 

Data-parallel  width  refers  to  the  number  of  ways  any  data-parallel  stage  was  replicated. 
3.9  Workshop  on  Software  Forensic  Environments 

We  held  a  workshop  to  discuss  the  software  forensic  environment  -  legacy  code  reuse, 
abstraction  and  representation,  portability,  and  parallelization  -  concepts  at  MIT  on  February 
27th  2008.  The  workshop  was  attended  by  the  following  people: 

Table  2.  February  27,  2008  Workshop  Attendees 


Saman  Amarasinghe 

MIT 

Ras  Bodik 

Berkeley 

Bill  Harrod 

DARPA 

Regina  Barzilay 

MIT 

Jon  Hiller 

STA 

Vivek  Sarkar 

Rice 

Robert  Miller 

MIT 

Ralph  Weischedel 

BBN 

Dawson  Engler 

Stanford 

George  Heineman 

WPI 

David  Padua 

UIUC 

Bill  Thies 

MIT  Student 

Guang  Gao 

Delaware 

Vikram  Chandrasekhar 

MIT  Student 

Una-may  O'reilly 

MIT 

Jason  Ansel 

MIT  Student 

Doug  Post 

HPC 

Marek  Olszewski 

MIT  Student 

Rick  Pancoast 

Lockheed 

Michael  Gordon 

MIT  Student 

Craig  Rasmussen 

LANL 

Danny  Dig 

MIT 

Cornell  Wright 

LANL 

Milissa  Benincasa 

BRSC 

Bob  Chambers 

Northrop 

Michael  Van  De  Vanter 

SUN 

Alfred  Scarpelli 

AFRL 

Daniel  J.  Quinlan 

LLNL 

James  Anderson 

Lincoln 

Doug  Post,  Craig  Rasmussen,  Cornell  Wright,  and  Bob  Chambers  described  the  legacy  code 
problems  their  respective  institutes  have  faced  in  the  past.  After  a  description  of  ideas  from  the 
software  forensics  environment  study  (the  Reincarnation  of  Streaming  Applications  study)  three 
breakout  groups  worked  on  a  problem  description  that  a  potential  software  forensics  environment 
research  effort  would  attempt  to  solve;  a  technology  roadmap  to  support  software  forensics 
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environment  research  and  associated  innovation  and  development;  and  software  forensics 
environment  milestones  that  could  be  the  basis  for  research  and  development  and  used  to 
evaluate  technical  progress.  After  further  discussion  this  information  was  refined  and  a  final  set 
of  slides  was  prepared  by  the  groups  (See  Appendix  A).  The  workshop  highlighted  the 
importance  of  the  legacy  code  issues  and  why  we  think  we  will  be  able  to  solve  this  problem. 

3.10  Prototypes 

We  have  build  a  simple  prototype  that  can  instrument  an  existing  C  program,  extract  data 
movement  by  executing  the  program,  analyze  the  data  movement  to  extract  the  streaming  data 
patterns,  report  these  patterns  to  the  user  using  a  graphical  interface,  add  annotations  to 
parallelize  the  stream  components  and  finally  parallelize  the  program. 
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4  Result  and  Discussions 


In  order  to  demonstrate  the  feasibility  of  understanding  and  extracting  the  underlying  parallelism 
from  legacy  code,  we  implemented  an  end-to-end  system  that  takes  existing  legacy  C  programs 
and,  with  minimal  programmer  help,  extracts  the  parallelism.  We  focused  on  streaming 
applications  such  as  video,  audio,  and  digital  signal  processing,  which  are  often  described  in 
documentation  by  a  block  diagram  with  a  fixed  flow  of  data. 

To  exploit  pipeline  parallelism  using  our  system,  the  programmer  annotates  the  natural 
boundaries  of  pipeline  partitions,  and  then  our  system  records  all  communication  across  those 
boundaries  during  a  training  run.  This  communication  trace  is  converted  to  a  stream  graph  that 
shows  the  high-level  structure  of  the  algorithm  as  well  as  a  list  of  producer/consumer  statements 
that  can  be  used  to  trace  down  problematic  dependences.  If  the  programmer  is  satisfied  with  the 
parallelism  in  the  graph,  he  recompiles  the  annotated  program  against  a  set  of  macros  that  are 
emitted  by  our  analysis  tool.  These  macros  serve  to  fork  each  partition  into  its  own  process  and 
to  communicate  the  recorded  locations  using  pipes  between  processes. 

We  have  applied  our  methodology  to  six  case  studies:  MPEG-2  decoding,  MP3  decoding,  GMTI 
radar  processing,  and  three  SPEC  benchmarks.  Our  tool  was  effective  at  parallelizing  the 
programs,  providing  a  mean  speedup  of  2.78x  on  a  four-core  architecture.  Despite  the  potential 
unsoundness  of  the  tool,  our  transformations  correctly  decoded  ten  popular  videos  from 
YouTube,  ten  audio  tracks  from  MP3.com,  and  the  complete  test  inputs  for  GMTI  and  SPEC 
benchmarks. 
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5.  Conclusion 

To  summarize,  this  work  makes  the  following  contributions: 

-  We  have  shown  that  for  the  class  of  streaming  applications,  pipeline  parallelism  is  very  stable. 
Communication  observed  at  the  start  of  execution  is  often  preserved  throughout  the  program 
lifetime,  as  well  as  other  executions.  While  the  code  can  be  complicated,  the  underline 
communication  pattern  is  simple  and  is  amenable  to  extraction. 

-  We  have  defined  a  simple  API  for  indicating  potential  pipeline  parallelism  in  the  program. 
Comparable  to  threads  for  task  parallelism  or  OpenMP  for  data  parallelism,  this  API  serves  as  a 
fundamental  abstraction  for  pipeline  parallelism. 

-  We  developed  a  dynamic  analysis  tool  for  tracking  producer/consumer  relationships  between 
coarse-grained  program  partitions.  The  tool  outputs  a  stream  graph  of  the  application,  which 
validates  or  refutes  the  parallelism  suggested  by  the  programmer.  It  also  provides  a  detailed 
statement-level  trace  and  a  set  of  macros  for  automatic  parallelization. 

-  We  applied  our  methodology  to  six  case  studies,  encompassing  MPEG-2  decoding,  MP3 
decoding,  GMTI  radar  processing,  and  three  SPEC  benchmarks.  We  extracted  meaningful  stream 
graphs  of  each  application,  and  achieve  a  2.78x  mean  speedup  on  a  four-core  architecture. 
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6.  Recommendations 

It  is  clear  that  legacy  program  code  is  an  extremely  important  issue  for  US  competitiveness  and 
national  security.  The  conclusion  of  the  software  forensic  environment  workshop  is  that  the 
advent  of  multicore  and  other  technologies  will  make  the  legacy  problem  even  more  acute  as 
applications  will  be  forced  to  restructure  because  of  these  technologies.  Thus,  finding  a  solution 
to  the  legacy  problem  is  of  great  national  importance. 

We  observed  that,  today,  we  are  closer  to  finding  a  viable  technological  solution  for  this 
problem.  However  any  viable  solution  will  have  to  combine  aspects  of  multiple  emerging 
technologies  in  different  areas  such  as  program  understanding,  analyses  and  compilation, 
program  verification  and  testing,  and  human  interfaces.  This  will  require  a  substantial  effort  and 
will  need  to  bring  together  researchers  from  these  separate  communities. 
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List  of  Acronyms,  Abbreviations,  and  Symbols 


Acronym 

AART 

DoD 

MPEG-2 

MP3 

GMTI 

bzip2 

hmmer 

HMM 

SPEC 


Description 

Assisted  Application  Reincarnation  Tool 
Department  of  Defense 

MPEG-2  video  decoder,  M(oving)  P(icture)  E(xperts)  G(roup)  -  2  is  a 
standard  for  coding  of  moving  pictures  and  associated  audio  information 
MP3  audio  decoder,  MP3  (MPEG-1  Audio  Layer  3)  is  a  digital  audio 
encoding  format 

Ground  Moving  Target  Indicator 
Compression  and  decompression  algorithm 
Calibrating  HMMs  for  biosequence  analysis 
Hidden  Markov  Model 

Standard  Performance  Evaluation  Corporation 
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Appendix  A  Final  Slides  from  Software  Forensic  Environment  Workshop 


8:00-8:30 

8:30-10:05 

8:30-9:00 

9:00-9:30 

9:30-9:55 

9:55-10:05 

10:05-10:15 

10:15-11:00 

11:00  -  noon 


noon  - 1:00 
1:00-2:00 
2:00-3:30 
3:30-3:45 
3:45-4:45 
4:45  -  5:00 


SFE  Workshop 

Breakfast 

Description  of  the  Legacy  problem  from  the  DOD/DOE  user  community 

Doug  Post,  HPC 

Craig  Rasmussen  and  Cornell  Wright,  LANL 
Bob  Chambers,  Northrop  Grumman 
Rick  Pancoast,  Lockheed  Martin 

Break 

SFE:  current  thinking  and  plan  of  action  (Bill  &  Saman) 

Breakout  working  groups 

Problem  Description  (D407) 

Technology  Roadmap  (this  room) 

Milestones  and  Evaluation  (D451) 

Lunch 

Breakout  working  groups  continued 
Feedback  from  the  working  groups 
break 

General  discussion 
Wrap-up 
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SFE:  Software  Forensics 

Environment 
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Legacy  Code 

•  310  billion  lines  of  legacy  code  in  industry  today 

-  60-80%  of  typical  IT  budget  spent  re-engineering  legacy  code 

-  (Source:  Gartner  Group) 

•  Even  a  bigger  problem  for  the  DoD 

-  Lifetime  of  systems  are  much  longer  in  DoD 

-  Mission  critical  code  were  highly  optimized  for  the  original 
machine 

•  Now  code  must  be  migrated  to  multicore  machines 

-  Current  best  practice:  manual  translation 
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SFE:  Software  Forensics  Environment 

*  Solve  the  problem  of  modernizing  the  legacy  code  base 

*  Create  a  program  within  DARPA  IPTO  to  spearhead  the 
innovation  of  necessary  technology 

*  The  main  idea  of  the  program 

-  Invent  the  technologies  needed  to  transform  a  legacy  program  into  a 
“modernized  replica” 

•  The  modernized  replica  behave  identically  to  the  original  program 

*  Goals  of  the  program 

-  The  cost  of  transformation  is  greatly  reduced  (by  lOOx  to  lOOOx) 

-  Errors  and  deviations  of  the  transformed  program  is  reduced  (eliminated?) 

-  The  modernized  replica  can  be  trivially  mapped  in  to  multiple  current 
architectures 

-  Modernized  replica  can  take  advantage  of  multicore  and  other  forms  of 
parallelism  in  modern  architectures. 

-  Modernized  replica  is  easy  to  understand  and  manage 


Outline 
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SFE  Transformation  Process 


Start  with  the  existing  program 

-  Assumes  that  we  have 

•  Compilable/buildable  source 

•  Working  executable  +  test  suite 

Produce  a  Modernized  Replica 

Holistic  process  involving  tools  and  programmers 

-  Each  team  proposes  a  process 

-  The  process  can  involve 

•  Static  program  analysis  and  model  checking 

•  Dynamic  analysis  and  testing 

•  Tools  to  assist  the  programmer  (HCI) 

•  Learning 

•  Automated  and  assisted  program  transformation 

•  Programmer  education  and  training 


Program  Reincarnation: 
A  Holistic  Approach 

Dynamic  analysis 


-  Managed  program  execution 

-  Program  invariant  inference 

-  Application  knowledge  database 

Assisted  parallelization 

-  GUI  tool 

Correctness  in  reincarnated 

-  Test  Generation 

-  Divergence  Analysis 

Static  analysis 

-  Automatic  parallelization 

-  info  for  program  understanding 

Learn  about  the  domain 

-  Flag  domain  specific  issues 

-  Generate  domain-specific  hints 

Bring  programs  to  modern  age 

-  Block  diagram 

-  Refactoring  identification 


Legacy 


So"-™ 

=jile 

f  N 

Original 

Compiler 
k  1 

zn — ■ 

b 

Original 

Binary 


Block  Diagram 
Representation 


Modernized  Replica 

•  Functionally  equivalent  to  the  original  program 

-  Original  program  as  the  specification 

•  Convertible  to  an  executable  form 

-  Preferably  an  automated  path  to  multiple  platforms 

•  Will  provide  the  following  benefits 

-  Ability  to  easily  create  an  executable  for  multiple  modern  computer  systems 

-  Ability  to  extract  performance  from  modern  computer  systems 

•  multi-processor,  multicore  and/or  SIMD  parallelism 

•  new  memory  systems 

-  Restructured  to  make  it  adhere  to  modern  software  engineering  practices 

-  Exposes  the  high  level  structure  and  provide  better  documentation  to  make  it  more 
programmer-comprehensible 

-  Simplify  maintenance  and  provide  the  capability  to  expand  the  program. 

•  SFE  Support  many  choices  (each  teams  propose  one) 

-  Examples:  Streamlt,  Mathlab,  etc. 
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Breakout  Sessions 


Small  working  groups  (6  to  12) 

In  each  group  appoint 

-  A  Moderator 

-  A  Scribe 

In  the  first  session 

-  Try  to  define  what  should  be  done  in  the  program 

-  Bring  questions  and  discussion  points  on 
cross-cutting  issues  to  lunch 

Working  lunch 

-  Discuss  some  of  these  issues  (1  slide  summary  by  each  scribe) 

In  the  second  session 

-  Get  into  the  detailed  description  (specification) 

In  the  general  discussion  session 

-  Each  group  gets  20  minutes  to  present 


I  Problem  Description 

•  A  clear  and  crisp  description  of  the  legacy  problem 

•  What  should  be  the  scope  of  SFE? 

•  Should  SFE  be  confined  to  a  single: 

-  Class  of  programs  and/or  Input  language 

-  IDE 

-  Language  of  the  modern  replica 

•  Domain  of  the  legacy  code 

-  FORTRAN,  C,  MPI,  or  chosen  by  the  proposer? 

-  Signal  processing,  simulations  or  chosen  by  the  propser? 

-  Should  we  select  a  set  of  benchmarks? 

•  Programs  with  both  a  legacy  version  and  a  hand  modernized  version? 

•  Types  Modernized  Replica 

-  Leave  it  for  the  teams  to  propose  a  format? 

•  Mathlab,  C  with  libraries,  Streamlt  etc. 


Room  D407 
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II  Technology  Roadmap 

•  What  technologies  should  be  part  of  SFE? 

•  For  each  technology 

-  Why  is  it  not  done  today? 

-  What  can  be  reasonably  achieved  during  the  program? 

-  How  can  that  drastically  improve  the  SFE  process? 

•  Examples 

-  Use  unsound  techniques,  give  “mostly  sound”  suggestions  to  the  user 

-  Use  the  working  program  as  a  prototype 

•  static  and  dynamic  techniques  to  generate  a  “specification” 

•  Help  test  the  modernize  replica  for  the  compliance  with  the  working  program 

-  Use  natural  language  processing  type  techniques  to  identify  similar/interesting 
parts  in  a  large  code  base 

-  Use  learning  to  identify  repetitive  and  frequent  tasks  and  provide  hints  to  the  user 

•  This  Room  (Star) 


25 


Ill  Milestones  and  Evaluation 


•  Reduced  cost  (by  lOOx  to  lOOOx) 

-  If  the  process  is  fully  automated  ->  trivial 

-  Programmer  involvement  ->  user  studies? 

•  Can  we  find  a  set  of  applications  with  original  legacy  version  and  a  version  modernized 
using  current  practices  that  can  be  used  as  the  base  case? 

•  Reduction  of  errors  and  deviations 

-  What  is  a  good  measurement? 

•  Modernized  replica  trivially  map  in  to  multiple  modern  architectures 

-  “trivially  map”:  Automated  tools,  no  programmer  intervention 

-  “multiple  modern  architectures”:  at  least  one  distributed  memory  multicore  (cell  or  Tile64)  and 
one  shared  memory  multicore  (core  2  duo  or  niagara) 

•  Modernized  replica  is  efficient  and  effective 

-  Show  speedups  against  the  original  program  on  that  architecture 

-  Show  scalability  from  one  core  to  max  number  of  cores  available 

•  Modernize  replica  can  be  easy  to  understood  and  managed 

-  How  to  measure  quality  of  the  specifications  created? 

-  How  to  measure  malleability  and  extendibility  of  original  vs.  modernized  replica? 

•  Room  D451 
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Breakout  Session  Assignments 

I  Problem  Description  (D407) 

II  Technology  Roadmap  (this  room) 

III  Milestones  and  Evaluation  (D451) 


AM 

PM 

AM 

PM 

Saman  Amarasinghe 

roam 

roam 

Vivek  Sarkar 

M&E 

TR 

Bill  Harrod 

roam 

roam 

Ralph  Weischedel 

TR 

TR 

Jon  Hiller 

M&E 

M&E 

George  Heineman 

TR 

TR 

Robert  Miller 

M&E 

M&E 

Bill  Thies 

PD 

TR 

Dawson  Engler 

TR 

TR 

Vikram 

M&E 

TR 

David  Padua 

PD 

PD 

Jason  Ansel 

TR 

PD 

Guang  Gao 

PD 

TR 

Marek  Olszewski 

TR 

M&E 

Una-may  O'reilly 

TR 

TR 

Michael  Gordon 

M&E 

TR 

Doug  Post 

PD 

M&E 

Danny  Dig 

TR 

PD 

Rick  Pancoast 

PD 

PD 

Milissa  Benincasa 

PD 

PD 

Craig  Rasmussen 

PD 

PD 

Michael  Van  De  Vanter 

M&E 

TR 

Cornell  Wright 

TR 

M&E 

Daniel  J.  Quinlan 

TR 

TR 

Bob  Chambers 

M&E 

M&E 

Al  Scarpelli 

PD 

PD 

James  Anderson 

TR 

M&E 

Regina  Barzilay 

TR 

Ras  Bodik 

TR 

TR 
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What  is  the  Problem? 


•  Billions  of  dollars  are  currently  invested  in 
operational  systems 

-  National  Security  and  Economic  Interests  exist  to  sustain 
the  operational  capabilities  of  these  systems 

-  Software  lifetime  for  these  systems  is  20+  years 

•  The  lifetime  of  hardware  installations  for  current 
DOE  Systems  is  18  -  36  months 

-The  legacy  software  applications  must  run  on  the  next 
generation  architecture  as  well  as  previous  versions 

•  Currently  DOE  legacy  software  is  ported  to  the 
next  generation  architecture  by  hand 

-This  is  expensive  and  a  time  intensive  process 

>  Estimated  current  cost  per  line  of  code  to  rewrite  the  code  is 
$100.00,  average  lines  of  code  for  a  single  DOE  application  is  a 
half  a  million.  Therefore  the  cost  for  one  application  is  50 
million  dollars. 

MANUAL  PROCESS  IS  ERROR  PRONE! 


MB  -  5/2/2007 
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What  is  the  Objective? 


•  Maintain  and  extend  DOE  flagship 
applications  in  support  of  National  Security 
interests  and  Economic  competitiveness 

-Need  the  capability  of  transforming  these  applications  to 
run  on  new  hardware  architectures 

-Need  tools/techniques  to: 

>  Reduce  errors  in  porting/transforming  the  application 

>  Reduce  application  development  costs 

>  Improve  application  code  maintainability 

-Keep  acceptable  performance 


MB  -  5/2/2007 
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Program  Approach 


•  Define  high-level  program  abstraction 
which  will  allow  the  transformation  of 
legacy  applications 

•  Develop  semi-automated  mechanisms  to 
transform  legacy  applications  into  the 
modern  replica 

•  Define  and  create  analysis  tools  to  aid 
humans  to  understand  and  preserve  the 
knowledge  contained  in  the  legacy 
application 

•  Develop  tools  to  ensure  program 
correctness  throughout  the 
transformation  process 

•  Develop  tools  to  assist  in  the  maintenance 
of  transformed  programs 


MB  -  5/2/2007 
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NORTHROP  GRUMMAN 


Workshop 


IC  Conversion  Issues  into  the  new  ODOE 
Environment 


Robert  Chambers 

Business  Development  Enterprise  Architect 

Mission  Systems 

Northrop  Grumman  Corporation 


Overview 


■  IC  Transformation  Overview 

■  Overview  of  current  processing  paradigm 

■  Explain  current  architecture 

■  Director  of  National  Intelligence  (DNI)  Framework 

■  New  Driving  requirements 

■  Evolution  vs.  Revolution 

■  How  to  convert  monolithic  stove  pipe  APIs  into 
component  services  that  leverage  the  new 
technologies 

■  Discussion 
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I C  Transformation  Overview 


■  The  intelligence  community  (1C)  is  in  the  process  of 
revolutionizing  its  computing  infrastructures  to  exploit 
innovations  in  emerging  system  paradigms,  such  as  service- 
oriented  and  event-driven  architectures. 

This  development  permits  consolidation  and  integration  of 
diverse  systems  to  dramatically  increase  intelligence  asset 
value  and  enable  new  levels  of  autonomic  cross-intelligence 
collaboration  and  system  intelligence. 

■  The  goal  is  to  produce  intelligent  autonomic  systems  that 
are  virtual,  event  driven,  ana  secure  in  a  globally 
distributed  environment. 

■  The  unique  challenges  in  servicing  the  future  1C  are 

Continual  High-bandwidth  data  ingest  rates  with  low  latency 
timing  requirements 

■  Complex-event  processing  (CEP) 

■  Knowledge  management  and  retention 

■  Intelligence  analyst  process  consolidation  and  automation 

■  Intelligent  network  development 
Fielding  intelligent  secure  networks 
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Extending  the  SOA  model  out  to  2020 
adds  the  following  new  capabilities: 


■  A  development  environment  that  is  fully  service-oriented  with 
complete  registry  integration  and  governed  via  integrated 
workflow  management. 

■  Security  that  is  included  from  service  concept  to  service 
retirement  as  an  integral  part  of  the  SOA. 

■  Security  becomes  a  set  of  services  within  the  SOA. 

■  An  autonomic  on-demand  operating  environment  that  is  self- 
aware  and  self-managing. 

It  load-balances  and  optimizes  the  infrastructure  to  meet 
service  contracts  and  quality-of-service  agreements  on  the  fly. 

■  High-performance  grid  computing  with  high-speed, 
externalized,  sharea-memory  buses  between  computing  nodes 
with  high  redundancy  and  failover. 

■  An  intelligent  network  with  intelligent  application-aware 
routers  that  route  global  messaging  traffic  for  optimal 
performance. 

■  Services  that  can  dynamically  configure  themselves  to  meet 
objectives  as  in  grammar-oriented  architectures. 
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Current  I C  Architectural  Overview 

“The  Stovepipe  Paradigm” 
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Current  I C  Processing  Paradigm 

■  Hundreds  of  Megabytes/Sec  to  Gigabytes/Sec 
continually  (24X7X365) 

■  Many  Petabytes  of  Storage 

■  SAN  and  NAS 

■  Structured  (Database) 

■  Unstructured  (Flat  files  with  Metadata) 

■  Product  timeliness  in  the  minutes 

■  Very  I/O  Intensive  algorithmic  monolithic  processing 

■  Extensive  use  of  SMP  Virtual  Memory 

■  Super  Computers  packed  full  of  CPUs  to  get 
enough  memory  to  build  the  Virtual  Memory 

■  Memory  managed  through  project  developed 
mechanisms 

■  Hundreds  of  Systems 

■  Millions  of  lines  of  code  (Fortran,  C,  C++,  JAVA) 
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Example  Current  Process 


IC 

▼ 

New 

Work 


r 

Determine 

A 

Program 

Scope 

Systems 

Management 

Engineering 

v _ 

Block 

J 

Operational 

Problems 


MoneyXpefinitioi 
Schedule 


List  of  possible 
work  packages 


Problem  Reports  OOA,  OOP,  UT 


Operations 
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Processing  Example 

(Adding  a  New  Branch  to  the  object  requires  "Everyone"  to  change!) 


Raw  Data  Ingest 


Compressed  Object  Tree 
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New  Driving  requirements 


■  The  1C  Stove  Pipe  architecture  resulted  for  a  number  of  reasons 


■  The  result  has  manifested  into  Processing  Centers  that  have  no 
floor  space,  have  Power  &  Cooling  issues,  and  staggering  TCO  to 
maintain  an  ever  increasing  variety  of  ageing  hardware  equipment 
and  software  licenses. 

■  With  decreasing  congressional  funding,  less  and  less  money  is 
being  allocatecrto  keep  up  with  emerging  threat  technologies. 

■  The  1C  has  no  choice  but  to  modernize. 

■  The  Federal  Enterprise  Architecture  (FEA)  and  the  Director  of 
National  Intelligence  (DNI)  are  the  governing  bodies  for  the 
facilitation  of  tnis  change 

■  http://www.whitehouse.gov/omb/egov/a-2-EAModelsNEW2.html 
http://www.dni.gov/ 
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Open 


Integrated 


Oo/'e  Ent&rprfs  & 
Serwc  es 


Collaboration 

Enterprise  Portal 

Content  Discovery  Sc. 
Delivery 

Evolution  vs.  Revolution 


i 

...an  approachable,  adaptive,  integrated  and 
reliable  infrastructure  delivering  on 
demand  services  for  on  demand  business 
operations ... 

Virtualized  Autonomic 


The  Chicken  or  the  Egg! 
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How  Do  We  Get  There? 
How  Do  We  Manage  1 1? 


Systems 

Software  interoperability 
Duplication  of  functions 
Data  element  standardization 
Common  operating  environment 
COTS 

Host  applications  locally 
Heavy  clients 


Tomorrow 

Service  providers  and  consumers 
Business  process  integration 
Optimization  and  specialization 
Standardized  business  products 
Integrated  networked  environment 
Commercial  service  providers 
Provide  service  globally 
Application  delivery  over  network 


Service  Oriented  Architecture 


j  j 
*  **  •* 


¥ 


Mainframes  App  Servers 
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Multi-lnt  Core  Reference 


ure 
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Summary 


Assured 
Net  Cached 


Battles  pace 
Reach  back 
and  Smart  Pull 


Regional  and 
Globally  Distributed 
Processing 

jili 

1  1  . . . 


Collaboration 

Services 


TSAT 


|pata 


Secure 
Iniellig  ence 
Community 
Network  | 
Convergence 
Layer 


MAN 


Note: 

MAN  =  Metropolitan  Area  Network 

TSAT  =  Transformational  Communications  Satellite 
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The  I C  has  World  Class  Processing  Problems 


■  The  2020  1C  SOA  described  here  is  much  more  than  today’s 
notion  of  a  service-oriented  architecture. 

■  The  2020  SOA  vision: 

■  Is  a  service-oriented,  event-driven,  virtualized,  grid  computing 
fabric  that  is  knowledgeable  and  self-aware 

■  Manages  the  many  complexities  of  advanced  computing 
infrastructures  automatically  with  minimal  human  intervention 

■  Is  self-optimizing  and  self-adjusting  to  meet  the  ever- 
changing  needs  of  the  1C  enterprise 

■  Can  be  easily  expanded  and  adjusted  by  humans  to 
encompass  new  mission  needs 

■  This  revolutionary  way  of  doing  business  in  the  1C,  will  be 
procured  and  implemented  over  the  next  few  years. 

■  Once  operational,  the  1C  SOA/grid/EDA  will  continue  to 
evolve  as  new  technologies  are  plugged  in  and  played  in 
support  of  the  evolving  intelligence  environment. 
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Multi-Stage  Transformation  is  Required 


1.  Build  the  Revolutionary  Infrastructure 

2.  Wrap  Legacy  Processing  so  that  it  runs  as  Service 
Based  on  the  new  infrastructure 

3.  Decompose  Legacy  Processing  into  functional 
elements 

■  Abstract  functional  elements  into  encapsulated 
services  available  for  reuse 

■  Model  encapsulated  services  into  late  binding 
compound  services  and  generate  BPEL 

■  Build  Workflow  Plans  to  perform  an  end-to-end 
thread 

■  Test  and  Validate 

4.  De-Commission  Wrapped  Legacy  API 

5.  New  processing  becomes  operational 


221/2008 
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Technology  Needs 


Wide-Area 

Network 


Global 

Information 

Grid 


Joint 

Force 

Commander 


Computing  e-Utility 


Distributed  Database, 
File,  and  Application 
Data  Sources 


Fanner 


Portal 


Joint  Task 
Force, 
Coalition 
Interagency 
Forces 


4 

Joint  Task 
Force, 
Component 
Headquarters 
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ON  DEMAND  OPERATI NG  ENV1 RONMENT 


Portal 


This  is  really  hard 
to  do  and  is  key  to 
future  success! 


Enterprise  Service  Bus 


Resource  Management 


Data  Management 


Network  Management 


General  Purpose  Processors  Storage  Area  Networks 


LAN/WAN 
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Event- Driven  Applications 


Event  processing  and 
analysis  are  critical  to  the 
formulation  of  sound 
intelligence. 

The  IC’s  job  is  to  predict 
and  prevent  negative 
complex  events  such  as 
those  of  9/1 1 . 

Two  emerging  fields 
EDA  and  CEP  will  be 
an  integral  part  of  the 
IC’s  2020  SOA. 

EDA  applications  can  be  sorted 
into  four  categories:  the  first 
three  are  aimed  at  engineering 
better  intelligence  systems; 
the  fourth,  CEP,  is  aimed  at 
expanding  insight: 


Complex- 

Event 

Processing 


Event-based  Business  Process  Management 


Event- 

enabled 

Processes 


Intelligence 

--/ 

Feeds 

L 

Business 
Process 
Management 

Manage  Process 


Service 

Provider 


Mediated 

Events 


Simple 

Events 


Message-oriented  Middleware  Web  Services,  Enterprise  Service  Bus,  e-Mail 

_ 

— i 

^4.  fti  ihcj'nh.p  1 


Intelligence 

Publish  / 

Feeds 

1 

Knowledge  and 
Information 
Stores 


m 

Knowledge  and 

£ 

Information 

H 

Stores 
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Simple  Events 


■  Simple  EDA,  where  application  programs  explicitly 
send  and  receive  messages  directly  to  and  from 
each  other 

■  For  example,  through  message-oriented 
middleware  or  Web  services. 


■  This  is  the  publish  and  subscribe  model. 


Intelligence  Publish 


2^ 


Knowledge  & 
Information 
Stores 


Inputs 


Event 


Subscribe 


■  SOA  ESBs  are  good  at  this! 


Knowledge  & 
Information 
Stores 
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I  ntegration  Broker 


■  EDA  mediated  by  integration  brokers,  which 
transform  and  route  simple-event  messages 
according  to  logical  rules.  This  can  be  viewed  as 
rule-based  event  processing. 


NORTHROP  GRUMMAN 


Event- enabled  processes 


■  EDA  directed  by  business-process-management 
(BPM)  engines,  which  conduct  the  end-to-end  flow 
of  a  multi-step  process  using  BPM-oriented  events. 


Intelligence 

Publist 

Inputs 

. 2 

Event 


Business 

Process 


Event 


Subscribe  Management 


There  are  a  limited 
number  of 
commercial 
products  available 

First  form  of 
autonomic  processing 


Knowledge  & 
Information 
Stores 
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This  is  really  hard 
to  do  and  is  key  to 
future  success! 


Complex- event  processing 

■  CEP  applications,  where  a  sophisticated  event 
manager  or  network  of  event-processing  agents 
logically  evaluates  multiple  events  from  one  or  more 
event  streams  to  provide  better  insight  for  sense- 
and-respond  applications  and  business-activity 
monitoring. 

■  This  type  of  monitoring  is  used  for  signal  analysis, 
security  vigilance,  ana  related  functions.  The 
processing  can  occur  in  any  intelligent  device  (e.g., 

§rid  computing  nodes  or  intelligent 
evices). 

omplex 


network 


Intelligence 

Publisl 

Inputs 

. 2 

Event- 


Event 


Event 
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Subscribe 

We  are  having 
to  build  as 
custom  applications 

Need  standards  based 

Autonomic  capabilities 


Processing  Filter;  Aggregate 
Agent 


Business 

Activity 

Monitoring 


Rules 


Event- 
Processing 
Agent 
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LOCKHEED  MARTIN 


SFE:  Software  Forensics  Environment  A 
DoD  Systems  Integrator  Perspective 

SFE  Workshop 

Prepared  for  the  SFE  Workshop  and  DARPA IPTO  at  MIT 


Presented  by:  Rick  Pancoast 
856-722-2354 
rick.pancoast@lmco.com 
Lockheed  Martin  MS2  -  Moorestown,  NJ 
28  February  2008 
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Legacy  DoD  Code 


LOCKHEED  M A 


•  Legacy  DoD  Code  Exists  in  Many  Languages,  Some 
Obscure 

•  Today,  C  and  C++  are  common 

•  Legacy:  Ada,  Jovial,  CMS-2,  FORTRAN,  etc. 

.  TADSTAND  C  (Tactical  Digital  Standard,  ~  1990) 
Mandated  all  DoD  Code  be  Written  in  Ada  (HOL)  or 
Assembly  Language  (including  CMS-2  [UYKs]) 

•  This  is  why  it  is  the  way  it  is 

•  Needed  SECDEF  dispensation  to  deviate  from  the  TADSTAND 

•  With  the  DoD  Push  for  COTS  and  Open  Architecture 
(OA),  there  has  Been  a  180°  Turnabout 

•  TADSTANDS  are  no  longer  invoked 

But  the  “Mess”  is  Out  There 
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DoD  Code  Conversion  Has  Been  Done 


V 


•  DoD  Has  Used  Code  Conversion  Software,  and  Hand  Conversion 

•  Johns  Hopkins  APL  CMS-2  to  Ada  Used  Experimentally  in  late  80’ s 

•  Command  &  Control:  CMS-2  to  C++  (Open  C&D);  Hand  Ada  to  C++ 

•  Weapons:  Ada  to  Java  (hand  conversion) 

Commercial  Code  Conversion  is  also  Rampant: 

•  Java  to  Visual  C#  [Microsoft  Java  Language  Conversion  Assistant  2.0] 

•  C  to  C++  [Free  Software  Foundation] 

•  C  to  VHDL  is  Popular 

•  Datatek  (Business  partner  with  IBM  and  Sun) 

Provides  “Language  Conversion  Services” 

•  TSRI,  Many  Others  .  .  . 

•  No  One  is  Really  Addressing  the  Multicore  Issue 

•  Application  Software  needs  to  be  Mapped  Efficiently  to  Multiple  Processors 

Code  Conversion  Has  Been  Used  by  DoD  - 
And  it  Does  Work . . .  But . . . 


Source  Language 

Assembler 

Target  Language 

Basic 

C# 

C 

A 

C++ 

C++ 

| 

COBOL 

COBOL 

Java 

Fortran 

r 

.NET 

Pascal 

Other 

PL/I 

Other 
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DoD  Code  Conversion  and  Validation 


V 


•  Code  Conversion  is  Probably  the  Easy  Part 

•  Code  Can  be  Quickly  Checked  for  Proper  Functionality 

•  The  Tough  Part  is  Verifying  and  Validating  the  Converted 
Code  -  to  the  Same  Pedigree  as  the  Original  Code 

•  A  Significant  Portion  of  Development  Cost  is  Validation  and  Verification 

•  Regression  Testing  Can  be  Very  Costly  (Error  Branches,  etc.) 

•  Automated  Regression  Testing  (as  Part  of  the  Conversion  Process)  Would 
be  Extremely  Valuable 

•  SFE  Can  Provide  a  Valuable  Tool  for  DoD 

•  Code  Conversion  (with  Multicore  -  Multiprocessor  Target  Architecture) 

•  Converted  Code  Verification  and  Validation 

•  Regression  Testing 

Sample  DoD  Code  Can  Be  Used  for  Verification 
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Ferlamufttie  (I.Sxli 
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$2.0 


$1.0 
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Why  Is  DoD  Concerned  with 

T~*  7  7  7  7  0  r.  o  Courtesy  of  Dr. 

Embedded  bojtwarer 


□  Software 

□  Hardware 


Source:  “HPEC  Market  Study”  March  2001 


-< 


.<$>  .C?  45?  V5>N  45?  45?  45?  45? 

^A  ^A  ^A  ^A 


Jeremy  Kepner, 
MIT  Lincoln  Lab 


Estimated  DoD  expenditures 
for  embedded  signal  and 
image  processing  hardware 
and  software  ($B) 


•  COTS  acquisition  practices  have  shifted  the  burden  from  “point  design” 
hardware  to  “point  design”  software 

•  Software  costs  for  embedded  systems  could  be  reduced  by  one-third  with 
improved  programming  models,  methodologies,  and  standards 

MIT  Lincoln  Laboratory 
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To  Port  Or  Not  To  Port.  There 

Is  No  Question*. 


Douglass  Post  —  DoD  High  Performance 
Computing  Modernization  Program 

Robert  Gold  —  DoD  Defense  Research  and 

Engineering 

MIT  Computer  Science  Dept./DARPA  Workshop  on  Code  Porting/Reuse 
Feb  28,  2008,  MIT  Computer  Science  Dept.,  Cambridge,  MA 


*Apologies  to  W.  Shakespeare 
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DoD  HPC  Modernization  Program 


HPC  Centers 


ARSC 

♦ 


AHPCRC  ♦ 


55,000  processors 


330  TFIops-2008 
560  TFIops-2009 


ASC  ★ 


j  If  7 
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♦  SMDC 


%MHPCC 
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★  MSRCs 

♦  ADCs 


>  Supercomputers 
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Software  Applications 
Support 
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Education  &  Outreach 
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Exponential  Growth  In  Supercomputer  Speed  And 
Power  Is  Making  It  A  “Disruptive”  Technology. 


Enable  paradigm  shift 

•  Potential  to  change  the 
way  problems  are 
addressed  and  solved 

•  Make  reliable 
predictions,  about  the 
future* 

•  Superior  engineering  & 
manufacturing 

•  Enable  research  to 
make  new  discoveries 

•  A  vastly  more  powerful 
solving  methodology! 


I960  1970  1980  1990  2000  2010 


Year 


Computer  power  comes  at  the  expense  of  complexity! 


*  Apologies  to  Yogi  Berra 
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The  future  is  exa-flops/s  (1018  Flops) 


Extrapolation  to  2020 
(1-10  GFIops/core) 
2000:  7.2  TFLOPs/s 
~5000  cores 
2010:  2x1 03  TFLOPs/s 
1 05-6  cores 
2020:  106  TFLOPs/s 
108'10  cores 
How  do  we  program 
for  108'10  cores? 
Especially  if  the 
cores  are  different? 


Computing  Power 
for  the  world's  fastest  computer 


4 


64 


Year 


The  DoD  replaces  its  supercomputers  every 

four  years! 


HPCMP  Modernizes  DoD  computing  with  $50M  annual  purchases. 


No.  of 

Avail 

PEs 

Peak 

GFLOPS 

of 

Actual 

PEs 

Equiv. 

of 

1,024- 

PE 

HABU 

^  MSRC  Systems  ^ 

Number 

of 

Actual 

PEs 

HABU 
Rating 
per  1,024 
PEs 

Memory 

(GB) 

ERDC 

SGI  Origin  3900 

Cray  XT3  (Upgrade) 

Cray  Hood 

1,024 

8,320 

3.08 

1,024 

16,640 

1,008 

1,434 

3.08 

11.54 

8,192 

43,264 

93.76 

8,848 

10.39 

17,696 

8,608 

40,701 

89.76 

i 

1 

NAVO 

IBM  Regatta  P4 

2,944 

6.55 

5,968 

2,832 

20,019 

18.83 

IBM  Cluster  1600  P5 

2,976 

12.31 

5,952 

2,816 

20,237 

35.78 

IBM  Cluster  1600  P5 

1,504 

13.66 

3,008 

1,408 

10,227 

20.06 

IBM  Regatta  P4 

1,408 

2.10 

1,408 

1,328 

7,322 

2.89 

IBM  Regatta  P4 

512 

6.55 

736 

464 

3,482 

3.28 

1 

— 

SGI  Altix  Cluster  (D) 

256 

8.68 

256 

256 

1,536 

2.17 

IBM  Opteron  Cluster 

2,372 

4.73 

3,456 

2,304 

10,437 

10.96 

Linux  Networx  Xeon  Cluster 

2,100 

5.80 

4,096 

2,048 

12,852 

11.89 

Linux  Networx  Woodcrest  Cluster 

4,286 

16.07 

8,572 

4,160 

51,432 

67.26 

Linux  Networx  Dempsey  Cluster 

3,360 

10.86 

6,720 

3,336 

21,504 

35.63 

Linux  Networx  Cluster 

256 

5.21 

256 

256 

1,567 

1.30 

1 

1 

IBM  Regatta  P4  (D) 

32 

2.55 

32 

32 

166 

0.08 

SGI  Origin  3900 

2,048 

3.08 

2,048 

2,032 

2,867 

6.16 

SGI  Origin  3900  (D) 

128 

1.90 

128 

128 

179 

0.24 

HP  Opteron  Cluster 

2,048 

6.71 

4,096 

2,048 

10,650 

13.42 

SGI  Altix  Cluster 

2,048 

6.84 

2,048 

2,000 

12,288 

13.68 

SGI  Altix  4700  (Density) 

256 

12.02 

1,024 

250 

1,638 

3.00 

SGI  Altix  4700  (8192  2GB  Density, 
1024  4GB  Memory) 

9,216 

12.02 

22,528 

9,000 

58,982 

108.14 

^  MSRC  Totals  (j 

.  54,506Jj) 

f  332,784)) 

541.4 

12/2007 

10/6/2009 
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OPERATIONS  PER  SECOND  (OPS) 


LLNL  and  LANL  have  had  a  new  supercomputers  roughly  every  3 
years  since  1943  &  a  new  programming  paradigm  every  10-15  years 


1940  1950 


1960 


1970  1980 

YEAR 


1990 


Post  and  Cook,  2000 

Machines  number  1953 

UN  I  VAC/650 
IBM  701  (Fixed) 

IBM  702(2-Drums) 

IBM  709 
LARC 

STRETCH  7030 
IBM  7094 
CDC  3600 
CDC  6600 
CDC  7600 
CDC  Star 
Cray  1 
Cray  XMP 
Cray  YMP 
Meiko  CS-2 
Cray  J-90 
DEC  8400  5/300 
DEC  84005/440 
DEC  Tera  Cluster 
DEC  Compass  Cluster 
DEC  Forest  Cluster 
Compaq  TeraCluster  TC 
Compaq  SC  Cluster 
IBM  SP  ID 
IMB  ASCI  SKY 
IBM  ASCI  Blue 
IBM  ASCI  White 
SGI  Origin2000 
Sun  Sunbert _ 

Assembly  language 
Fortran 
C 

C++ 
batch 

Octupus/LTSS/CTSS 
UNIX 
Serial 
Pipeline 
Vector 
Parallel 

small,  slow,  drums 
50-100  kwords 


Joe  Requa,  Doug  Post 


2000 


— James  Mercer-Smith 


2010  small  core,  large  core 
larger,  faster 
shared  memory 
distributed  memory 
clusters  of  shared  memory 


Paradigm  shifts 


Operating  Systems 


Memory  Structure 


ft  t 


ttt  ft 
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How  do  we  get  to  the  Brave  New  World? 

Brand  new  codes  or  improvements  of  existing  codes? 


Developing  new  codes  is  challenging! 

-  Requires  large  (10  to  30  professionals),  multi-disciplinary,  multi- 
institutional  teams 

-  Takes  5  to  10  years 

-  Requires  extensive  verification  and  validation 

-  Requires  a  transition  path  to  the  user  community 

-  How  many  people  would  use  Windows  if  almost  everyone  else 
used  Mac  OS  or  LINUX  or  UNIX...? 

*  For  engineering  codes,  the  practical  approach  is  to  port/upgrade 
existing  tools  and  develop  new  ones  where  necessary 

There’s  no  practical  alternative  to  porting 

-  Independent  software  vendors  are  porting  very  slowly 

-  “Reuse”  is  essential,  a  different  use  of  “reuse” 

-  Reuse  the  code,  not  individual  components  in  other  apps 
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Three  Challenges 

Performance,  Programming  and  Prediction 


1.  Performance  Challenge  -  Computers  power  increasing  through 
growing  complexity 

-  Massive  parallelization,  multi-core  &  heterogeneous  (CELL,  FPGA, 

GPU...)  processors,  complex  memory  hierarchies . 

2.  Programming  Challenge  -Programming  for  Complex  Computers 

-  Rapid  code  development  of  codes  with  good  performance 


-  Develop  accurate  predictive  codes 

•  Verification 

•  Validation 

Code  Project  Management 


♦> 


Better  software  development  and  production  tools  are  desperately 
needed  for  us  to  take  full  advantage  of  computers 


a  r\m  /ortrtn 


3.  Prediction  Challenge  — Developing  predictive  codes  with  complex 
scientific  models  Programming  Predictio" 


❖ 


Train  wreck  coming  between  the  last  two 
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Computational  Engineering  Requires 

a  Village! 


We  need  a  complete  problem  solving 

capability: 


Computers 

Codes 

V&V 

Users 

Sponsors 
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KPC 


Staffing  levels 


Developing  a  Large,  Multi-scale,  Multi-effect  Code  Takes 

a  Large  Team  a  Long  Time 


Falcon  Project  Life  Cycle 

2003^ 

major  product  releases 

<D 

(/> 

ai 

a> 

a> 


aj 


product  - 

I  improvement 
Initial  and 

development  development 


c/> 

c 

o 

OJ 

o 

Q. 

Q_ 

CO 

0) 

E 

o 

■4—f 

</> 

3 

O 


Production, 

product  development 
and  user  support  phase 

Continued  product 
testing  (V&V)  and 
application  by  L| 


sers 


CO 

c 

O) 

CD 

jQ 

0) 

CO 

CD 


c 

CD 

E 

CD 

u. 

CD 


Retirement 

user  support 
minimal  development 
minimal  porting 


0 


5  .  10 

serious 
testing  by 
customers 


15  20  25 

calendar  time  (years) 


30 


35 
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The  process  is  complex! 

a.  ■  i  f  Optimize 

Computational  /Component 

Science 
Workflow 


Not  the  WaterFall  Model! 


— D.  E.  Post,  R.  P.  Kendall,  Large-Scale  Computational  Scientific  and  Engineering  Project  Development  and  Production  Workflows,  CTWatch  (2006),  vol.2-4B,pp68-76. 
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The  Process  has  large  risks!* 

Code  Project  Schedule  For  Six  Large-scale  Physics  Codes 


Program  Milestones  Set 


Program 
Planning 
And  Start 


1992—  1995 


New  Code  Projects 
Launched 


1996 


1997 


1998 


Milestones 

1  st  2nd  3rd 

1999  !  2000  !  2001^ 


Egret  Code  Project 

Jabiru  Code  Project 


Falcon  Code  Project 


Kite  Code  Project 

Finch  Code  Project 


♦ 


Gull  Code  Project 


♦ 


ho 

o 

o 

4^ 


CD 

</> 


CD 

CO 

c 

o 

o 

CD 

</> 

</> 

CD 

</> 


4 - ♦ - © 


♦ - 4 


4-© 


o  o 


o  o 


Project  Start 
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Computational  Science  Demands  A  New  Paradigm, 
D.  E.  Post,  L.  G.  Votta,  Physics  Today,  2005,  58  (1): 
P.35-41 


</> 

</> 

CD 

o. 


CD 

</> 

!-► 

o 

3 
CD 
</> 

4 


■o 

3 

cd' 

o 

l-K 

CO 

C 

o 

o 

CD 
</> 
c n 
cd 
</> 
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Computational  Engineering  Code  Developer’s 
World  -  Six  Major  Challenges  and  Risks 


Zillions  of  complex 
processors  linked 
with  complicated 
and  slow  networks 
+  Little  help  for 
dealing  with  this 
complexity 


Problem  setup  (e.g. 
mesh  generation) 
takes  too  long  for 
rapid  design 
development 


13 


Many  strongly  coupled  effects  and 
massively  parallel  computers 


Complex 
Computer 
Architectures 
And  Inadequate 
Tools 


Complex 
Science  and 
Mathematics. 


Large,  multi¬ 
disciplinary,  multi- 
institutional  teams 


Complex 
Organizations 


Lengthy 

Problem 

Setup 


Code 

Development 


Science  & 
User  Driven 
Requirements 


Rudimentary^ 

V&V 
Methods 

Immature  methods  and  few 
validation  experiments 
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Laws  of  nature  & 
user  needs  win 
every  time 


10/6/2009 


What  are  the  characteristics  of  the  DoD  Big 
^^^^CodeMan^DO^bi^codes^^^^ 


Surveyed  DoD  codes  to  verify  characterizations  of  CSE  codes. 

*  Identify  general  characteristics 
Questionnaire  asked  for: 

*  Contact  information 

*  Code  purpose 

*  Team  size,  number  of  users 

*  Domain  Science  area  and  sponsor 

*  Code  size  (sloes) 

-  Total  and  for  each  language 

*  Code  history 

-  How  long  did  the  code  take  to  develop  and  how  old  is  it  now?) 

*  Platforms 

*  Degree  of  parallelism 

*  Computer  time  usage 
Memory  requirements 

*  Algorithms 
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urveyed  the  top  40  DoD  codes  (  ordered  by  time  requested),  15  responses 


CTH 

93,435,421 

HYCOM 

89,005,100 

GAUSSIAN 

49,256,850 

ALLEGRA 

32,815,000 

ICEPIC 

26,500,000 

CAML 

21,000,000 

ANSYS 

17,898,520 

VASP 

18,437,500 

Xflow 

15,165,000 

ZAPOTEC 

12,125,857 

XPATCH 

23,462,500 

MUVES 

10,974,120 

MOM 

18,540,000 

OVERFLOW 

8,835,500 

COBALT 

14,165,750 

Various 

8,125,000 

ETA 

11,700,000 

CPMD 

5,975,000 

ALE3D 

5,864,500 

PRONTO 

5,169,100 

Application  Code  Hours 

5,200,100 
4,950,000 
5,719,000 
4,100,750 
4,578,430 
5,080,000 
5,500,000 
5,142,250 
4,700,000 
4,210,000 
3,955,610 
4,691,000 
2,420,000 
4,000,000 
4,050,000 
4,466,000 
3,800,000 
3,500,000 
3,600,600 

Freericks  Solver  2,600,000 


Application  Code 


Hours 


DMOL 
ICEM 
CFD++ 

ADCIRC 
MATLAB 
NCOM 
Loci-Chem 
GAMESS 
STRIPE 
USM3D 
FLUENT 
GASP 

Our  DNS  code  (DNSBLB) 

ParaDis 

FLAPW 

AMBER 

POP 

MS-GC 

TURBO 
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KPC 


Characteristics  Aren’t  Surprising. 


Team  size 
FTEs 

#  users 

Total 

sloc(k) 

SLOC 
Fortran  77 
(k) 

SLOC 

Fortran 

90,  95  (k) 

SLOC  C 
(k) 

SLOC 
C++  (k) 

other 

Mean 

38 

5,038 

820 

24% 

34% 

17% 

13% 

13% 

Median 

6 

27 

275 

❖  Even  now,  codes  are  developed  by  teams 

Most  codes  have  more  users  than  just  the  development 
team 

Codes  are  big 

❖  58%  of  the  codes  are  written  in  Fortran. 

❖  New  languages  with  higher  levels  of  abstraction  are 
attractive,  but  they  will  have  to  be  compatible  and  inter¬ 
operable  with  Fortran  with  MPI. 


16 
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Further  Data  Isn’t  Surprising  Either 


Total 

project 

age 

age 

production 

version 

total 

number  of 
different 
platforms 

Largest 
Degree  of 
Parallelism 

Typical 
minimum 
#  of 

processors 

Typical 

Maximum 

#of 

processors 

Is  memory 
a 

limitation? 

Memory 
processor 
GBytes 
/  proc 

Mean 

19.8 

15.1 

6.9 

1000  to 
3000 

225 

292 

Sometimes 

0.75-4 

Median 

17.5 

15.5 

7.0 

1000  to 
3000 

128 

128 

•  Most  codes  are  at  least  1 5  years  old 

•  Most  codes  run  on  at  least  7  different  platforms 

•  Most  codes  can  run  on  -1000  processors,  but  don’t 

•  Most  users  want  at  least  1  GByte  /  processor  of 
memory. 
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5  detailed  case  studies  of  CSE  codes  make 

similar  observations. 


Falcon 

Hawk 

Condor 

Eagle 

Application  Domain 

Product  Performance 

Manufacturing 

Product  Performance 

Signal  Processing 

Project  Duration 

~10  years  (since  1995) 

-6  years  (since  1999) 

-20  years  (since  1985) 

-3  years 

Number  of  Releases 

9  Production 

1 

7 

1 

Earliest  Predecessor 

1970s 

early  1990s 

1969 

? 

Staffing 

15  FTEs 

3  FTEs 

3-5  FTEs 

3FTEs 

Customers 

<50 

10s 

100s 

Demonstration  code 

Nonimal  Code  Size 

-405,000 

-134,000 

-200,000 

<100,000 

Primary  Languages 

F77  (24%),  C  (12%) 

C++  (67%),  C  (18%) 

Fortran  77  (85%) 

C++,  Matlab 

Other  Languages 

F90, Python,  Perl, ksh/ 
csh/sh 

Python,  Fortran  90 

Fortran  90,  C,  Slang 

Java  Libraries(~70%) 

Target  Hardware 

Parallel  Supecomputers 

Parallel  Supercomputers 

PCs  to  Parallel 
Supercomputers 

Embedded  App 

Status 

Production 

Production  ready 

Production 

Demonstration  code 

Sponsors 

DOE 

DoD 

DoD 

DoD 

18 
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oftware  Development  Tools  were  identified. 


Falcon 

Hawk 

Condor 

Eagle 

Code  Development 
Environment 


Compilers 

Scripts 

Debuggers 

Performance  Monitoring 
Domain  Decomposition 


Execution  Environment 


Element  Generation 
Visualization 
Data  Analysis 


Code  Development  Process 
Tools 


Configuration  Management 
Bug  Tracking 

Code  Documentation 


Support  Libraries 


Computational  Mathematics 
Parallel  Programming  Libraries 


F77,  F90,  C 

Perl,  Python,  ksh,csh,sh, 
SCHEME, Gmake 

TotalView, 

SourceForge 

Pixie, DCPI,Speedshop, 
Prof 


CVS 


Web-based 


C++,C,  Fortran, Java 
Python 

Valgrind,  gbd 

Speedshop,  PAPI 
Metis 

CAD  ProE 
ICE,VTK,  Paraview, 
Tecplot 

XDMF  (supports 
Paraview) 


CVS 

Custon(-Bugzilla) 


Doxygen 


F77,  F90 
None 


C++,  Matlab,Java 
csh, perl, make, 
c  make,  ANT 


TotalView,  gbd  TotalView,  gbd,  DBX 
None  Mercury  TATL 


In-house  tools 
CEI  Ensight, 
Paraview 


CVS 

None 

MS  Word 


PETc, 

VSS,PSPASES,CG  In-House  tools 


MPI 


MPI 


MPI 


N/A 

Matlab 

Matlab 


Perforce, 
Subversion 
no  formal  system 

In-code  comments 


FFTs 
MPI,  PVL 
(-POOMA) 


Nene 

F77,C 
C  Shell 

print+FTNCHK 

NetPIPE 

Data  basis  sets 
Local  product 
Local  Product 


Manual 

no  formal  system 
User  documentation; 
in-code  comments 


BLAS 

MPI,  TCP/IP 


19 


79 


10/6/2009 


Many  Barriers  and  Challenges 


♦>  v&v 

❖  Changing  computer  architectures 

❖  Parallel  Scaling  and  Parallel  Programming 
Models 

Complexity  of  Domain  Science  (strongly- 
coupled  Multi-physics,  multi-scale...) 

Cautious  user  community 

-  Answers,  not  performance  is  not  their 
ultimate  goal 

-  The  better  is  the  enemy  of  the  good! 

20  10/6/2009 
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KPC 


Recommendations 


❖  Porting  will  be  essential  in  the  future 

Not  just  for  HPC  but  for  all  computer  programs 
venturing  into  tomorrow’s  multi-core  heterogeneous 
world 

Tools  should  facilitate  porting,  most  useful  tools: 

-  Reduce  complexity 

-  Hide  complexity  of  computer  in  portable  libraries 

-  Simplify  Verification 

-  Preserve  ability  to  link  to  many  languages 

❖  Also  useful  to  improve  Software  Engineering 

-  Documentation,  modularity,  interface  standards, 

interoperability,  scalability  10/6/2009  use 


Software  reuse  in  acquisition  systems 


❖  Contractor-initiated  reuse 

-  Reuse  from  prior  developments  or  investments 

•  Product  line  approach  to  developing 
components  -  Fighter  A/C  radar  for  example 

-  Reuse  from  outside  sources 

•  Signal  processing  libraries  (VSIPL) 

•  Purchased  environments 

»  OS,  Middleware,  graphics  toolkits 

•  Open  Source 
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Software  reuse  in  acquisition  systems 


❖  Government-sponsored  reuse 

-  GFI  from  previous  developments 

•  Made  available  by  Gov’t  Purpose  Rights 

-  Contracted  collaborative  environments 

•  Central  code  repository  specific  to  an 
acquisition  or  domain 

•  Controlled  by  central  CM  agent 

-  GFI  from  purchase 

•  Government  purchased  license  made  available 
to  offerors  and  developers 

»  Tactical  Component  Network 
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Issues 


Assurance 

-  Is  the  reused  code  free  from  malware  and  vulnerabilities? 

❖  Performance 

-  How  do  we  know  what  the  code  really  does? 

-  How  does  its  use  affect  system  properties? 

-  Who  is  responsible?  How  much  testing? 

❖  Efficiency 

-  Realized  %  reuse  from  previous  developments  rarely  meets 
initial  estimates!! 

Intellectual  property 

-  Can  we  reuse  the  patented  technologies  in  the  code? 

-  Can  we  derive  other  works? 


24 
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Back-up  Slides 
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Most  projects  are  at  least  1 5  years  old  (and  had 

predecessors). 


Code  Project  Age  (July,  2006) 


Project  age  (years) 


•  Almost  all  the  codes  that  will  run  on  platforms  delivered 
within  the  next  5  years  exist  now. 


26 
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Median  code  size  is  ~  300,000  sloes 


Code  size  (single  lines  of  code,  sloe) 

107 


106 

o 

o 


O 
\- 

00 

104 

1000 

1  5  10  20  30  50  70  80  90  95  99 

Percent 


Most  codes  will  take  5  years  or  more  to  develop 


1D.  E.  Post  and  R.  P.  Kendall,  International  Journal  of  High  Performance  Computing  Applications,  18(2004),  pp.  399-416 
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Median  team  size  is  6  FTEs 


- —  Size  of  Development  Team/User  group 


1  5  10  20  30  50  70  80  90  95  99 

Percent 


•  Teamwork  will  be  essential  for  new  codes,  especially  for 
petaflop  computing. 
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Count 


Median  code  runs  on  7  different  platforms 


■  1 0a  total  number  of  different  platforms 


Number  of  different  platforms  the  code  runs  on 


2  4  6  8  10  12  14  16 

Range 


■e—  10a  total  number  of  different  platforms 


Number  of  different  platforms  the  code  runs  on 


1  5  10  20  30  50  70  80  90  95  99 

Percent 


•  Code  portability  is  a  key,  if  not  dominant,  priority  for 
code  developers. 
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Median  code  has  ~  25  users. 


Number  of  active  users 


1  I - 1 — i - 1 — i - 1 - 1 — i - 1 — i - 1 

1  5  10  2030  50  7080  90  95  99 


Percent 


•  User  support  and  acceptance  will  be  essential  for  success 

•  Support  for  code  maintenance  will  be  essential! 
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Median  code  is  fairly  parallel. 


31 


■©—  1 1 -Largest  Degree  of  Parallelism 


8.  >  30,000  processors 
7.  10,001  to  30,000  processors 
6.  3,001  to  10,000  processors 
5.  1,001  to  3000  processors 
4.  300  to  1 ,000  processors 
3.  101  to  300  processors 
2.  11  to  100  processors 

1 .  Less  than  1 0  processors 


Maximum  Degree  of  parallelism 


1  _ I _ I _ I I _ _ I I _ I _ I _ _ 

1  5  10  20  30  50  70  80  90  95  99 

Percent 


•  We  have  to  scale  from  100-3,000  processors  to 
50,000-200,000  processors  in  two  years  to  achieve 
petaflop  performance. 
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“Routine”  processor  count  is  much  less 


1  5  10  2030  50  7080  90  95  99 


Percent 


•  We  have  to  scale  from  30-200  processors  to  20,000- 
200,000  processors  in  two  years  to  achieve  petaflop 
performance. 
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58%  of  the  codes  are  predominantly  written 

in  Fortran. 


Team  size 
FTEs 

#  users 

Total 

sloc(k) 

SLOC 
Fortran  77 
(k) 

SLOC 

Fortran 

90,  95  (k) 

SLOC  C 
(k) 

SLOC 
C++  (k) 

other 

Mean 

38 

5,038 

820 

24% 

34% 

17% 

13% 

13% 

Median 

6 

27 

275 

❖  New  languages  with  higher  levels  of  abstraction  are 
attractive,  but  they  will  have  to  be  compatible  and 
inter-operable  with  Fortran  with  MPI. 
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Most  runs  don’t  use  a  lot  of  processors 


Total 

project 

age 

age 

production 

version 

total  number 
of  different 
platforms 

Largest 
Degree  of 
Parallelism 

Typical 
minimum  # 
of  processors 

Typical 
Maximum  # 
of  processors 

Is  memory  a 
limitation? 

Memory 

processor 

GBytes 

/proc 

Mean 

19.8 

15.1 

6.9 

1000  to 
3000 

225 

292 

Sometimes 

0.75-4 

Median 

17.5 

15.5 

7.0 

1000  to 
3000 

128 

128 

•  Most  users  want  at  least  1  GByte  /  processor  of  memory. 
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♦> 


PCMP  TI-05  Application  Benchmark  Codes  perform  differently  on 

different  platforms. 


Aero  -  Aeroelasticity  CFD  code 
(Fortran,  serial  vector,  15,000  lines  of  code) 

❖  AVUS  (Cobalt-60)  -  Turbulent  flow  CFD  code 

(Fortran,  MPI,  19,000  lines  of  code) 

❖  GAMESS  -  Quantum  chemistry  code 

(Fortran,  MPI,  330,000  lines  of  code) 

❖  HYCOM  -  Ocean  circulation  modeling  code 

(Fortran,  MPI,  31,000  lines  of  code) 

❖  OOCore  -  Out-of-core  solver 

(Fortran,  MPI,  39,000  lines  of  code) 

❖  CTH  -  Shock  physics  code  (SNL) 

(-43%  Fortran/~57%  C,  MPI,  436,000  lines  of  code) 

WRF  -  Multi-Agency  mesoscale  atmospheric  modeling  code 
(Fortran  and  C,  MPI,  100,000  lines  of  code) 

❖  Overflow-2  -  CFD  code  originally  developed  by  NASA 

(Fortran  90,  MPI,  83,000  lines  of  code) 
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erformance  depends  on  the  computer  and  on  the 

code. 


Normalized  Performance  =  1  on  the  NAVO  IBM  SP3  (HABU)  platform  with  1024  processors 
(375  MHz  Power3  CPUs)  assuming  that  each  system  has  1024  processors. 


GAMESS  had  the  most  variation  among  platforms. 
Code  Performance  (by  machine) 


RFCTH2 Lg 
RFCTH2  Std 
Overflow2  Lg 
Overflow2  Std 


OOCore  Lg  ^ 
OOCore  Std 


HYCOM  Lg 
HYCOM  Std 
GAMESS  Lg 
GAMESS  Std 
Avus  Lg 
WRF  Std 


Cray  XI 
IBM  P3 
IBM  P4 
IBM  P4+ 

HP  SC40 

SGI  03800 
SGI  03900 
Xeon  Cluster 
Xeon  Cluster 
SGI  Altix 
IBM  Opteron 


Substantial  variation  of  codes 
for  a  single  computer. 

Code  performance  (grouped  by  machine) 


SGI  Altix 
Xeon  Cluster  (3.4) 
Xeon  Cluster  (3.06) 
SGI  03900 


SGI  03800  = 


HP  SC45 


HP  SC40 


IBM  P4+ 


IBM  P4 


IBM  P3 


Cray  XI 


AERO  Std 
AERO  Std 
WRF  Std 
Avus  Std 
Avus  Lg 
Gamess  Std 

HYCOM  Std 
HYCOM  Lg 
OOCore  Std 
OOCore  Lg 
Overflow2  Std 
Overflow2  Lg 
RFCTH2  Std 
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Code  Performance  by  machine 


o 


8 


10 
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Relative  code  performance 

—SC  2005  panel  Tour  de  HPCjj^^ 
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Range  of  performance  among  machines  for  each  code 
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General  conclusions 


Performance  depends  on  application  and  on  the 
computer 

-  No  computer  works  best  for  all  applications 

-  A  suite  of  applications  requires  a  suite  of  computer  types 

Tuning  for  a  platform  can  pay  off  in  a  big  way 
Shared  memory  is  really  good  for  some  codes 
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We  made  9  observations  based  on 
detailed  case  studies. 


❖  We  made  9  observations  from  the  five  detailed  case 
studies  (Falcon,  Hawk,  Condor,  Eagle,  Nene). 

-  These  observations  and  conclusions  were 
consistent  with  our  prior,  less  detailed  case  studies. 

These  9  observations  help  identify  the  issues  to  focus 
on  for  petaflop  applications. 
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Nine  Cross-Study  Observations 


1.  Once  selected,  the  primary  languages  (typically  Fortran)  adopted  by  existing  code 
teams  do  not  change. 

2.  The  use  of  higher  level  languages  (e.g.  Matlab)  has  not  been  widely  adopted  by 
existing  code  teams  except  for  "bread-boarding"  or  algorithm  development. 

3.  Code  developers  in  existing  code  teams  like  the  flexibility  of  UNIX  command  line 
environments. 


4.  Third  party  (externally  developed)  software  and  software  development  tools  are 
viewed  as  a  major  risk  factor  by  existing  code  teams. 

5.  The  project  goal  is  scientific  discovery  or  engineering  design.  "Speed  to  solution" 
and  "execution  time"  are  not  highly  ranked  goals  for  our  existing  code  teams  unless 
they  directly  impact  the  science. 

6.  All  but  one  of  the  existing  code  teams  we  have  studied  have  adopted  an  "agile" 
development  approach. 

7.  For  the  most  part,  the  developers  of  existing  codes  are  scientists  and  engineers, 
not  computer  scientists  or  professional  programmers. 

8.  Most  of  the  effort  has  been  expended  in  the  "implementation"  workflow  step. 

9.  The  success  of  all  of  the  existing  codes  we  have  studied  has  depended  most  on 
keeping  their  customers  (not  always  their  sponsors)  happy. 
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Summary  of  Code  Attributes 


1000000 


Code  Attributes 


□  number  of  languages 
■  core  team  size 

□  nonimal  age 

□  lines  of  source  code 


Nesochen  sandwichensts 
Pibea  Ridge,  Kauai 
August  1  2001 


Hawk  _  , 

Falcon 

Project  Name 


lines  of  source  cod 
nonimal  age 
core  team  size 
number  of  languages 


Condor 


Nene 


jmi 
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Codes  primarily  use  one  or  two  programming  languages,  but 
utilize  many  others  for  special  purposes. 


Falcon 

Hawk 

Condor 

Eagle 

Nene 

Application  Domain 

Product  Performance 

Manufacturing 

Product  Performance 

Signal  Processing 

Process  Modeling 

~1 0  years  (since 

-20  years  (since 

-25  years  (since 

Project  Duration 

1995) 

-6  years  (since  1999) 

1985) 

-3  years 

1982) 

Number  of  Releases 

9  Production 

1 

7 

1 

? 

Earliest  Predecessor 

1970s 

early  1990s 

1969 

? 

1977-78 

Staffing 

15  FTEs 

3  FTEs 

3-5  FTEs 

3  FTEs 

~10FTEs+100s  of 
contributors 

Customers 

<50 

10s 

100s 

Demonstration  code 

-100,000 

Nonimal  Code  Size 

-405,000 

-134,000 

-200,000 

<100,000 

750,000 

Primary  Languages 

F77  (24%),  C  (12%) 

C++  (67%),  C  (18%) 

Fortran  77  (85%) 

C++,  Matlab 

Fortran  77  (95%) 

F90,  Python,  Perl,  ksh/c 

Other  Languages 

sh/sh 

Python,  Fortran  90 

Fortran  90,  C,  Slang 

Java  Libraries(~70%) 

C(1%) 

Parallel 

Parallel 

PCs  to  Parallel 

PCs  to  Parallel 

Target  Hardware 

Supecomputers 

Supercomputers 

Supercomputers 

Embedded  App 

Supercomputers 

Status 

Production 

Production  ready 

Production 

Demonstration  code 

Production 

Sponsors 

DOE 

DoD 

DoD 

DoD 

DoD,  DOE,  NSF 
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We  found  sparse  use  of  Software  Metrics 


Metric 


Falcon  Hawk  Condor  Eagle 


Lines  of  code 

Function  points 

Stories,  project  velocity 

Cyclomatic  complexity 

Data  coupling 

Comment  lines 

Locality 

Concurrency 

Defect  rates 

Time-to-fix  defects 

Number  of  debug  runs/unit  time 

Test  Coverage 

Frequency  that  regression  testing 

uncovers  problems 

Code  performance 

Degree  of  performance  optimization 

Parallel  scaling 

Number  of  users 

Number  of  production  runs/unit  time 
Computer  time  for  code 
development/unit  time 
Computer  time  for  production/unit  time 


x 

x 


x 

x 

x 


x 

x 


x 

x 


x 

x 

x 
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Cross-Study  Observations 


❖  Observation  #1 : 

-  Once  selected,  the  primary  languages  (typically  F77)  adopted  by 
existing  code  teams  do  not  change. 

•  Any  new  language  will  have  to  be  compatible  with  existing 
languages  and  will,  at  best,  be  introduced  only  incrementally. 

•  Migration  to  a  new  version  of  the  language  (e.g.  F77  to  F90) 
often  occurs,  but  seldom  to  a  different  language  (e.g.  F77  to 

C) 


Early  users  of  a  petaflop  machine  will  require  stable  Fortran,  C 
and  C++  implementations  with  MPI  libraries  on  the  new  hardware 
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Use  of  Higher-Level  Languages 


Falcon 

Hawk 

Condor 

Eagle 

Nene 

Application  Domain 

Product  Performance 

Manufacturing 

Product  Performance 

Signal  Processing 

Process  Modeling 

Project  Duration 

-10  years  (since 
1995) 

-6  years  (since  1999) 

-20  years  (since 
1985) 

-3  years 

-25  years  (since 
1982) 

Number  of  Releases 

9  Production 

1 

7 

1 

? 

Earliest  Predecessor 

1970s 

early  1990s 

1969 

? 

1977-78 

Staffing 

15  FTEs 

3  FTEs 

3-5  FTEs 

3FTEs 

~10FTEs+100s  of 
contributors 

Customers 

<50 

10s 

100s 

Demonstration  code 

-100,000 

Nonimal  Code  Size 

-405,000 

-134,000 

-200,000 

<100,000 

750,000 

Primary  Languages 

F77  (24%),  C  (12%) 

C++  (67%),  C  (18%) 

Fortran  77  (85%) 

C++,  Matlab 

Fortran  77  (95%) 

Other  Languages 

F90, Python,  Perl,  ksh/ 
csh/sh 

Python,  Fortran  90 

Fortran  90,  C,  Slang 

Java 

Libra  ries(~70%) 

C  (1%) 

Target  Hardware 

Parallel 

Parallel 

PCs  to  Parallel 

Embedded  App 

PCs  to  Parallel 

Supecomputers 

Supercomputers 

Supercomputers 

Supercomputers 

Status 

Production 

Production  ready 

Production 

Demonstration  code 

Production  | 

Sponsors 

DOE 

DoD 

DoD 

DoD 

DoD,  DOE,  NSF 
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Cross-Study  Observations 


*  Observation  #2: 

-  The  use  of  higher  level  languages  (e.g.  Matlab)  has  not  been 
widely  adopted  by  existing  code  teams. 

-  Higher  level  languages  are  utilized  by  some  teams  for 
“bread-boarding”  and  algorithm  development  followed  by 
implementation  in  a  lower  level,  but  higher  performance 
language 

“I’d  rather  be  closer  to  machine  language  than  more  abstract 
I  know  even  when  I  give  very  simple  instructions  to  a 
compiler,  it  does  not  necessarily  give  me  machine  code  that 
corresponds  to  that  set  of  instructions../’ 

quote  from  Condor  Technical  Team  Leader 
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Cross-Study  Observations 


❖  Observation  #3: 

— Code  developers  like  the  predictability,  flexibility 
and  universality  of  UNIX  command  line 
environments. 

One  of  the  reasons  that  IDE  tools  are  not  used  is 
that  “they  try  to  impose  a  particular  style  of 
development  on  me  and  I  am  forced  into  a 
particular  mold.  ” 

_ quote  fjx>m  Eagle  team  leader _ 

-  Any  new  IDE  will  need  to  meet  these 
requirements 
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Risk  Aversion  to  the  use  of  3rd  Party  Tools 


Falcon 

Hawk 

Condor 

Eagle 

Code  Development 
Environment 


Compilers 

F77,  F90,  C 

C++,C,  Fortran, Java 

Scripts 

Perl, Python, ksh, csh, sh, 
SCHEME, Gmake 

Python 

Debuggers 

TotalView, 

SourceForge 

Valgrind,  gbd 

Performance  Monitoring 

Kixie,L»UKi,speeasnop, 

Prof 

Speedshop,  PAPI 

Domain  Decomposition 

Metis 

Execution  Environment 

Element  Generation 

CAD  ProE 

Visualization 

ICE.VTK,  Paraview, 
Tecplot 

Data  Analysis 

XDMF  (supports 
_ Paraviowl _ 

Code  Development  Process 

Tools 

Configuration  Management 

CVS 

CVS 

Bug  Tracking 

Custon(-Bugzilla) 

Code  Documentation 

Web-based 

Doxy gen 

Support  Libraries 

Computational  Mathematics 

PETc, 

VSS.PSPASFS  CG 

Parallel  Programming  Libraries 

MPI 

MPI 

F77,  F90 


None 


TotalView,  gbd 


None 


In-house  tools 


CEI  Ensight, 
Para  view 


CVS 


None 


MS  Word 


In-House  tools 


MPI 


C++,  Matlab.Java 
csh,  perl,  make, 
cmake.ANT 

TotalView,  gbd,  DBX 

Mercury  TATL 


N/A 

Matlab 

Matlab 


Perforce,  Subversion 
no  formal  system 

In-code  comments 


FFTs 

MPI,  PVL  (-POOMA) 


Nene 


F77,C 


C  Shell 


print+FTNCHK 


NetPIPE 


Gaussian  basis  sets 


MACMOPLT 


MACMOPLT 


Manual 


no  formal  system 


User  documentation; 
in-code  comments 


BLAS 


MPI,  TCP/IP 


10/6/200? 


108 


Performance  Monitoring,  Profile  and  Analysis  Tools 


AIMS  (NASA  AIMS) 

>  DCPI  (DEC/Compaq/HP) 

DEEP 

Dimenas  (CEPBA  Barcelona) 

Dynamic  Probe  Class  Library  (IBM) 
DynaProf  (Univ.  of  Tennessee) 

FALCON  (Georgia  Tech) 

HPC  Toolkit  (Rice  Univ.) 

HPM  Toolkit  (IBM) 

Jumpshot  (Argonne-DOE) 

Monitor 

MPIMAP  (LLNL) 
mpiP(ORNL/LLNL) 

Pable/SvPablo(  Univ  Illinois/Univ.  North 
Carolina) 

PAPI  Libraries  (Univ.  Tennessee) _ 


Paradyn  (Univ.  Wisconsin) 
Paraver  ((CEPBA  Barcelona) 
PDT  (Univ.  Oregon) 

PE  Benchmarker  Toolset  (IBM) 
Performance  Toolkit  (IBM) 
prof/gprog/tprof/pgprof 
Quantity  (Rational/IBM) 
Speedshop  (SGI) 

Tau  (Univ.  of  Oregon) 
ThreadMon 
Timescan  (Etnus) 

TRAPPER 

WARTS 

Vampir(  Pallas,  now  Intel) 
Xprofiler  (IBM) 
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Debugging/Visualization  Tools 


Debugging 

❖  Visualization 

-  TotalView  (Etnus) 

-  CEI  Ensight 

-  Gdb  (gnu) 

-  Gnuplot 

-  DXB 

-  IDL 

-  Ladebug 

-  Kaleidagraph 

-  Great  Circle  (geodesic' 

-  Paraview 

Cross-Study  Observations 


Observation  #4: 


-  Third  party  (externally  developed)  software  and  software 
development  tools  are  viewed  as  a  major  risk  factor  by 
existing  code  teams. 


The  greatest  concerns  are  for  parallel  debuggers,  problem 
set-up  tools,  linkers  and  loaders  with  ability  to  link  many 
languages,  performance  analysis  tools,  run  schedulers, 
visualization  and  data  analysis  tools,  testing  tools, 
smoother  upgrades  to  operating  systems 
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Development  Objectives 


Development  Objectives 


Importance 


3 
2 
1 
0 

Parallel  Scalability 

Execution  Time 

Portability 
Speed  to  Solution 

Code  Reuse 
Reducing  Complexity 

Maintainability 

Objective 


Nene 


Code 


□  Falcon 

■  Hawk 

□  Condor 

□  Eagle 

■  Nene 
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Cross-Study  Observations 


Observation  #5: 

-  The  principal  goal  of  our  development  teams  has  been 
scientific  discovery  or  engineering  design. 

-  “Speed  to  solution”  and  “execution  time”  are  not  the  most 
highly  ranked  goals  for  our  existing  code  teams  (except 
where  it  impacts  the  science). 


The  highest  ranked  common  goals  expressed  by  our  case 
study  participants  are:  codes  that  work,  provide  accurate  and 
credible  results  and  are  portable. 
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Cross-Study  Observations 


❖  Observation  #6: 

-  All  but  one  of  the  existing  codes  studied  by  our  team  have 
adopted  an  “agile”  approach  to  code  development  without 
formal  software  engineering 

Hawk,  Condor,  Eagle  and  Nene  have  “agile”  teams  which 
emphasized  individuals  and  practices  over  processes  and 
tools;  Falcon  was  more  formal,  but  no  project  had  formal 
CMM  Level  2  certification 
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Staffing  Profiles 


Staff  Profile 

Falcon 

Hawk 

Condor 

Eagle 

Nene 

Scientists/Engineers 

14 

2 

3 

3 

9 

Computer  Scientists 

3 

1 

0 

0 

1 

Total 

17 

3 

3 

3 

10 

55 


115 


10/6/2009 


Cross-Study  Observations 


❖  Observation  #7: 


-  For  the  most  part,  the  developers  of  existing  codes 
are  scientists  and  engineers,  not  software  engineers 
or  professional  programmers. 

-  Many  developers  of  scientific  codes  are  also  the 
primary  users  of  those  codes. 


They  tend  to  be  suspicious  of  rigid  software 
engineering  methodologies,  preferring  the  “agile” 
approach.  Even  teams  with  long  histories  of 
collaboration  do  not  acknowledge  a  need  to  go 
beyond  CMM  Level  2 — they  emphasize  good  practices 
rather  than  good  processes 
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Code  Analysis  &  Design  Implementation _ Testing _ Maintenance 


Falcon 

25%-35% 

25%-35% 

1 5%-30% 

1 0%-30% 

Hawk 

25% 

40% 

20% 

15% 

Condor 

15% 

55%  1 

15% 

15% 

Eagle 

25% 

55% 

15% 

15% 

Nene 

35% 

45% 

15% 

5% 

Cross-Study  Observations 


❖  Observation  #8: 

-  Over  the  lifetimes  of  the  existing  codes  studied  to  date,  most 
of  the  effort  has  been  expended  in  the  implementation 
workflow  step. 

-  Includes  implementation  during  a  long  production  phase 

-  A  successful  computational  science  and  engineering  code  is 
undergoing  continual  development  in  response  to  new  user 
requirements 
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tomer  satisfaction,  not  marketing,  determines 

the  success  of  the  code. 


❖  Observation  #  9: 

— The  success  or  failure  of  a  code  depends  on  whether 
the  code  team  can  keep  its  customers  satisfied. _ 

Code  teams  that  helped  their  customers  succeed  in  their 
analysis,  predictions  or  research  were  successful.  The 
code  teams  that  didn’t,  found  that  their  codes  weren’t 
used  and  were  eventually  abandoned. 


Preliminary  Observations  from  the  Nene 

On-Site  Interview 

❖  Largest  project  yet  (25  years  old,  ~20,000  downloads,  ~1 00,000 
users,  100s  of  contributors) 

-  Over  5100  citations  for  primary  code  reference 

•  Huge  group  of  satisfied  users!!! 

-  Best  example  yet  for  the  dominance  of  pragmatic  practices 
over  processes  in  scientific  code  development 

-  Almost  no  role  for  formal  software  engineering 

-  Similar  to  Open  Source  Model 

Users  download  code  from  website  and  modify  it  to  solve 
problems 

•  Upgrades  negotiated  with  central  team 

Funding  agencies  (all  the  major  federal  agencies  that  fund 
science)  provide  support  for  domain  science,  not 
explicitly  for  code  development 
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Development  Objectives 


Falcon 

Hawk 

Condor 

Eagle 

Nene 

Parallel  Scalability 

Medium 

High 

Medium 

Medium 

High 

CvQr>i  ifir\n  Timo 

IV/li^rlii  im 

IV  /I  ■  1 1  m 

IV/li^rlii  im 

Marl  ii  i  m 

M  pH  ii  im 

1 _ A^/UUUUI  1  1  II  1  IV^  1  V  IUU  1  1  1  1  V  IUU  1  U  1  1  1  1  Vl^rVJ  IUI  1  1  1  1  V  IUU  IU  1  1  1 

IVICU  IU  1  1  1 

Portability 

High 

High 

High 

High 

High 

Speed  to  Solution 

High 

Medium 

Medium 

Medium 

Medium 

Code  Reuse 

Medium 

High+ 

Medium 

High 

Medium 

Reducing  Complexity 

Medium 

Medium 

High 

High 

High 

Maintainability 

High 

Low* 

High 

Low 

High 

f  For  new  code 

1  Survey  response;  the  code  is  highly  maintainable,  but  this  was  not  an  explicit  design  goal 


r^AjR;PA 


*Medium  implies  that  reuse  is  occasional 
high  implies  that  reuse  is  a  project  imperative 


61 


121 


10/6/2009 


Development  Objectives 


Development  Objectives 


Importance 


1 


0 

Parallel  Scalability 

Execution  Time 

Portability 
Speed  to  Solution 

Code  Reuse 
Reducing  Complexity 

Maintainability 

Objective 


Code 


□  Falcon 

□  Hawk 

□  Condor 

□  Eagle 
■  Nene 


I  KJI  U/ 
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Nene  Software  Development 

Prartircs 

I 


Practice 

Description 

Degree  Followed 

Requirements  Development 

Produce,  analyze  and  verify 
customer,  project  and 
"product"  requirements 

Distributed  management 
reduces  the  need  and  impact 

Manage  requirements  and 

Requirements  Management 

identify  inconsistencies  with 

same  as  above 

plan 

Project  Planning 

Establish  and  maintain  a  plan 
that  defines  project  activities 

same  as  above 

Project  Monitoring  &  Control 

Provide  an  understanding  of 
the  project's  progress 

No  formal  plan  or  deadlines 

Establish  and  maintain 

Yes,  tight  control  over 
program  library 

Configuration  Management 

integrity  of  work  products 
using  config.  mgt  and  control 

Objectively  evaluating 

Process  and  Product  Quality 

adherence  to  process 

Tight  control  over  contributed 

Assurance 

descriptions  and  resolving 
non-compliance 

capabilities 

Organizational  Process 
Definition 

Follow  an  organization-wide 
process 

No,  distributed  mgt  sets  its 
own  processes;  well-defined 
process  within  core  team 
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Practices  (Continued) 


Practice 

Description 

Degree  Followed 

Organizational  Training 

Develop  the  skills  and 
knowledge  of  staff  so  that 
they  can  perform  their  roles 
effectively 

An  important  output  of  this 
project  is  the  training  of 
graduate  students 

Risk  Management 

Identify  potential  problems 
before  they  occur  and 
mitigating  them 

Long  track  record  of 
successfully  managing  risks 

Software  artifacts 

Code  is  reviewed  by  PI 
before  submission  and 
inclusion  into  library 

Peer  Reviews 

(requirements,  design,  code) 
reviewed  by  peers  to  improve 
quality 

Planning  Game 


Frequent  Deliveries/Small 
Releases 


Quickly  determine  the  scope 
of  the  next  release  with 
business  priorities  and 
technical  estimates 

Frequent  releses  of  the 
highest  priority  items 


Not  relevant 

No,  delivery  occurs  when 
code  is  ready.  Timing  is 
driven  by  academic  calendar 
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Practices  (Continued) 


Practice 

Description 

Degree  Followed 

Design  only  what  is  being  Yes,  very  decentralized 

Simple  Design 

developed,  little  planning 
for  future 

Restructuring  to  remove 

planning  done  by  100's 
of  contributors 

Refactoring 

Some,  especially  in  areas 

duplication,  improve 

communication,  simplify 
or  add  flexibility 

of  active  development 

Pair  Programming 

Two  programmers  work 
side-by-side  at  one 
computer,  collaborating 
on  coding 

Some,  usually  only  in  the 
feature  integration  phase, 
where  Pis  and  students 
work  together  and  in 
some  cases  side-by-side 
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Practices  (Continued) 


Practice 

Description 

Degree  Followed 

Tacit  Knowledge 

Project  knowledge  is 
maintained  in 
participant's  heads 

Yes,  a  great  deal  is 
published,  but  tacit 
knowledge  is 

rather  than  documents 

important 

Collective  Ownership 

Anyone  can  change 
any  code  anywhere  at 
any  time 

No 

On-site  Customer 

Include  a  real,  live 
user  on  the 

Yes,  even  the  core 
team  members  are 

Test-Driven 

Development 

development  team 
Module  or  method  test 
are  written  before  or 
during  coding 

users 

Sometimes 
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We  found  sparse  use  of  Software  Metrics 


Falcon 

Hawk 

Condor 

Eagle 

Nene 

Lines  of  code 

X 

X 

X 

X 

X 

Function  points 

Stories,  project  velocity 

Cyclomatic  complexity 

Data  coupling 

Comment  lines 

X 

X 

Locality 

Concurrancy 

Defect  rates 

Time-to-fix  defects 

X 

X 

Number  of  debug  runs/unit  time 

Test  Coverage 

Frequency  that  regression  testing 
uncovers  problems 

X 

X 

Code  performance 

X 

X 

X 

X 

X 

Degree  of  performance  optimization 

X 

Parallel  scaling 

X 

X 

X 

X 

Number  of  users 

X 

X 

X 

Number  of  production  runs/unit  time 
Computer  time  for  code 
development/unit  time 

Computer  time  for  production/unit  time 

X 
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All  of  the  projects  made  use  of 

Testing. 


Case  Study 

Falcon 

Hawk 

Condor 

Eagle 

Nene 

Fraction  of  Code  Tested 

~30%* 

51-75% 

51-75% 

>75% 

>75% 

Conformance  between  scalar  and 
parallel 

n/a 

<2% 

no  formal  bounds 

<10'9  Units 

Conformance  with  experimental  tests 

<32% 

no  formal  bounds 

no  formal  bounds 

Verification 

•  Compare  to  exact  answer 

yes 

yes 

yes 

•  Monitor  conserved  quantities 

yes 

yes 

yes 

•  Preservation  of  symetries 

yes 

yes 

yes 

•  Compare  with  existing  codes 

yes 

yes 

yes 

yes 

•  Controlled  experiments 

yes 

yes 

yes 

Regression  Tests 

yes 

no 

yes 

yes 

*Regression  Tests 


128 


10/6/2009 


Workflows 


General  Phases  for  all  Life-Cycles  (after 


Code 

Analysis  &  Design 

Implementation 

Testing 

Maintenance 

Falcon 

25%-35% 

25%-35% 

1 5%-30% 

10%-30% 

Hawk 

25% 

40% 

20% 

15% 

Condor 

15% 

55% 

15% 

15% 

Eagle 

25% 

55% 

15% 

15% 

Nene 

35% 

45% 

15% 

5% 
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Reengineering  of  Legacy  Applications 


Technology  Roadmap  Workgroup 


Technology  Space 


1.  Program  Understanding 

2.  Modern  Replica 

3.  Testing  and  Verification 

4.  Human  Interfaces 

5.  Cost/Productivity  Assessment 
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Program  Understanding:  Technology 
Roadmap 

■  Techniques  to  identify  the  abstractions  (extraction  of  abstractions) 

Identifying  known  algorithms  within  applications 

■  Identifying  data-structures 

■  Dependence  analysis  signatures 

Static  and  dynamic  analysis 

■  Identification  of  invariants 

■  Dynamic  dependence  analysis 

■  Performance  optimization 

Learning  (Pattern  Recognition) 

Natural  language  processing  of  comments  and  language  construct  names 

■  Case  based  reasoning 

Building  the  Modern  Replica 

■  Refactoring 

■  Unsound  transformations  (adapted  by  programmer  or  learning  algorithm) 

Program  visualization 

■  Discovery 

■  Data  structure  based  visualization 
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Modern  Replica:  Technology  RoadMap 


■  Exciting  technologies 

What  can  be  achieved  using  them 
How  it  can  be  achieved  in  18  months 

■  However,  we  found  we  were 

“just  suggesting  we  solve  many  open  problems  in  computer 
science” 
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Modern  Replica:  Goal 


Technologies  to  remove  machine-specific  elements  of  code 
■  Certain  types  of  loop  unrolling 

Exploiting  pipeline  parallelism,  for  example 
Data  parallelization 
Pick  your  best  technique 
■  For  replicas,  one  must: 

Retain  information  for  optimization 

■  Can  (re-)generate  optimized  code 
Must  disallow  generated  executables 

■  Optimization  tweaks  must  be  first-class 
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Replica  Construction 


Different  languages  have  different  strengths 
■  Parallelism,  etc... 

Let  the  property  of  interest  be  your  guid 
Model  from  which  to  verify  properties 

Determinate  behavior 
■  Same  behavior  on  exact  same  input 
-  Easy  to  reproduce  defects,  a  help  to  debugging 
Seek  provable  properties  from  the  replica 

Models  of  parallelism 
CSP  or  DataFlow  or  ??? 
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Replica 


Challenges  in  avoid  legacy  pitfall  with  the  Replica  itself 
Will  the  Replica  be  the  legacy  code  in  +10yrs? 

Replica  must  be  easy  to  analyze 
General  representation  of  parallelism 

■  What  could  have  been  done  differently  when  legacy  code  was 
first  written. 

Early  optimization 

Hopes  for  automatic  parallelism  were  unfounded.  What  are  we 
assuming? 
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Testing  and  Verification:  Problem 
Statement 


■  Focused  problem:  establish  that  reengineered 
implementation  is  equivalent  to  legacy  version 

Subproblem:  establish  that  modern  replica  is  equivalent  to 
legacy  version 

■  Equivalence  w.r.t.  behavior  on  test  cases 

■  “Don’t  care”  cases  e.g.,  allow  modern  version  to  have 
fewer  bugs  than  legacy  version 

■  Safety  net  e.g.,  insert  assertions  in  modern  version 
when  assumptions  are  made  w.r.t.  error  handling 
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Testing  &  Verification:  Technology 
Roadmap 


■  Symbolic  execution 

Test  “equivalence”  of  two  versions  of  same  function 
Overcome  path  coverage  challenge  by  use  of  test  cases 
Test  for  k,h  degree  similarity 

■  Verification  of  semantic  equivalence 

Use  verifier  to  check  equivalence  of  constraints  from  two 
different  executions 

■  Definition  of  “don’t  care”  cases 

Use  test  cases  to  limit  behaviors  of  interest 
Add  assertions  to  modern  replica 

■  Loops  --  overcoming  major  challenge  in  verification 

Use  dynamic  analysis  to  distinguish  between  loops  with  few 
vs.  large  #  iterations 
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Human  Interfaces:  DARPA-hard 
problems 


1)  Observation:  three  kinds  of  expertise  are  needed  to 
both  develop  and  port  quality  HPC  code:  (i)  domain 
knowledge,  (ii)  numerical  methods,  (iii)  parallel 
programming.  Either  you  have  one  programmer  who 
knows  them  all  or  your  experts  must  effectively 
communicate.  [Yes,  there  is  an  expertise  gap  in  HPC 
programming.] 

Problem  statement:  Develop  technology  and/or 
social  methods  that  reduce  the  need  for  this  breadth 
of  expertise  or  make  the  communication  among 
experts  easier. 
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Human  Interfaces:  DARPA-hard 
problems  (contd) 


2)  Observation:  Where  do  specifications/abstractions 
come  from?  Programmers  are  often  aware  what 
these  abstractions  are,  but  it  is  tedious  or  not 
economical  to  write  them  down,  because  they  need  to 
be  formally  stated.  Example:  who  and  how  specifies 
that  a  library  routine  sorts  the  input  array? 

Problem  statement:  Develop  tools  to  help 
programmers  infer  specifications  of  modules  in  their 
code.  Develop  incentives  that  encourage 
programmers  to  use  these  tools. 
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Human  Interfaces:  DARPA-hard 
problems  (contd) 


3)  Observation:  We  believe  that  porting  HPC  code  is 
expensive  is  in  part  because  code  transformations  are 
repetitive  and  performed  manually. 

Problem  statement:  Develop  methods  that 
automate  these  transformations.  These  methods 
must  be  easily  programmable  by  the  programmer. 

For  example,  they  can  be  programmed  by 
demonstration. 
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Cost/Productivity  Assessment: 
Technology  Roadmap 


■  Use  of  performance  profiles  to  identify  software 
components  that  need  less  vs.  more  attention  from  a 
performance  viewpoint 

■  Ethnographic  studies  to  identify  easy  vs.  hard  steps  in 
manual  process 
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Technology  Space 


1.  Program  Understanding 

2.  Modern  Replica 

3.  Testing  and  Verification 

4.  Human  Interfaces 

5.  Cost/Productivity  Assessment 
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Ill  Milestones  and  Evaluation 


•  Reduced  cost  (by  lOOx  to  lOOOx) 

-  If  the  process  is  fully  automated  ->  trivial 

-  Programmer  involvement  ->  user  studies? 

•  Can  we  find  a  set  of  applications  with  original  legacy  version  and  a  version  modernized 
using  current  practices  that  can  be  used  as  the  base  case? 

•  Reduction  of  errors  and  deviations 

-  What  is  a  good  measurement? 

•  Modernized  replica  trivially  map  in  to  multiple  modern  architectures 

-  “trivially  map”:  Automated  tools,  no  programmer  intervention 

-  “multiple  modern  architectures”:  at  least  one  distributed  memory  multicore  (cell  or  Tile64)  and 
one  shared  memory  multicore  (core  2  duo  or  niagara) 

•  Modernized  replica  is  efficient  and  effective 

-  Show  speedups  against  the  original  program  on  that  architecture 

-  Show  scalability  from  one  core  to  max  number  of  cores  available 

•  Modernize  replica  can  be  easy  to  understood  and  managed 

-  How  to  measure  quality  of  the  specifications  created? 

-  How  to  measure  malleability  and  extendibility  of  original  vs.  modernized  replica? 

•  Room  D451 
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Metrics 


Performance 

-  Speedup  on  several  architectures 

-  Minimal  performance  level  for  acceptance;  not  used  for 
comparing  teams 

Productivity 

-  Measured  in  programmer  time,  not  SLOC  produced 

-  Primary  metric  for  comparison  of  teams 

-  Depends  on  expertise  of  programmers 

Maintainability 

-  Code  size  of  modernized  replica 

Flexibility 

-  Replica  easily  targeted  to  new  architectures  with  new 
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Phase  1  Benefits  Measurement 


*  A  referee  team  establishes  the  rules  of  the  game 

-  Example: 

•  a  benchmark  program  (possibly  several,  since  proposers  may  use 
different  source  languages);  ~10k  SLOC? 

•  multiple  target  architectures 

•  scalability  improvement:  2x  faster  than  unchanged  legacy  code  on 
target  arch 

•  maintainability  improvement:  2x  smaller  SLOC  than  legacy  code 

•  proposers  have  a  week  to  train  a  small  team  of  programmers  (grad 
students?,  independently  hired)  on  their  tool 

•  programmers  have  a  month  to  do  the  port  to  the  modern  replica 

•  measure  time  to  do  the  port 

-  Metrics  for  other  DoD  projects  are  available 

•  Reuse  their  procedures? 

*  Overall  goal:  2x  reduction  in  programmer  time  relative  to 
best-practices  hand  porting 
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Phases  and  Milestones 


*  Phase  1  (first  18  months) 

-  Deliverable:  prototype  process  and  tools 

-  Measurement:  1 -month  comparison  trial 

-  Goal:  2x  improvement  in  speedup,  porting  time,  code  size 

-  Critical  design  review 

-  Cut  down  proposing  teams 

*  Phase  2  (next  18  months) 

-  Goal:  lOx  aggregate  improvement  (scalability,  portability,  size) 

-  Flexibility:  how  much  time  to  take  advantage  of  a  completely  new 
architecture? 

-  Larger  codes:  6-month  comparison  trial  using  50k  SLOC? 

-  Cutting  down  teams  to  one  representation  (selected  for  handoff  to  a 
standardization  process?) 
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•  Proposer  picks: 

-  the  modern  replica  representation 

•  it  should  probably  already  exist,  because  Phase  1  isn’t 
long  enough  to  develop  it 

•  Referee  picks: 
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